Content for Decem's Python course.
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

at main 3341 lines 86 kB view raw
1{ 2 "cells": [ 3 { 4 "cell_type": "markdown", 5 "metadata": { 6 "slideshow": { 7 "slide_type": "slide" 8 } 9 }, 10 "source": [ 11 "## 11. Librería Pandas" 12 ] 13 }, 14 { 15 "cell_type": "markdown", 16 "metadata": { 17 "slideshow": { 18 "slide_type": "slide" 19 } 20 }, 21 "source": [ 22 "**pandas** es una librería *open source* que nos proporciona estructuras de datos y herramientas de análisis de datos potentes y fáciles de usar en Python." 23 ] 24 }, 25 { 26 "cell_type": "markdown", 27 "metadata": { 28 "slideshow": { 29 "slide_type": "slide" 30 } 31 }, 32 "source": [ 33 "Se puede instalar en nuestro entorno virtual con el siguiente comando:\n", 34 "\n", 35 "```\n", 36 "pipenv install pandas\n", 37 "```" 38 ] 39 }, 40 { 41 "cell_type": "code", 42 "execution_count": 1, 43 "metadata": { 44 "slideshow": { 45 "slide_type": "fragment" 46 } 47 }, 48 "outputs": [], 49 "source": [ 50 "import pandas as pd\n", 51 "import numpy as np" 52 ] 53 }, 54 { 55 "cell_type": "markdown", 56 "metadata": { 57 "slideshow": { 58 "slide_type": "fragment" 59 } 60 }, 61 "source": [ 62 "Se utiliza el alias `pd` como estándar de facto par el uso de **pandas**." 63 ] 64 }, 65 { 66 "cell_type": "markdown", 67 "metadata": { 68 "slideshow": { 69 "slide_type": "slide" 70 } 71 }, 72 "source": [ 73 "### Series\n", 74 "\n", 75 "Una serie representa una secuencia de datos unidimensional, y se crea pasándole a pandas una lista de datos." 76 ] 77 }, 78 { 79 "cell_type": "code", 80 "execution_count": 2, 81 "metadata": { 82 "slideshow": { 83 "slide_type": "fragment" 84 } 85 }, 86 "outputs": [ 87 { 88 "data": { 89 "text/plain": [ 90 "0 1.0\n", 91 "1 3.0\n", 92 "2 5.0\n", 93 "3 NaN\n", 94 "4 6.0\n", 95 "5 8.0\n", 96 "dtype: float64" 97 ] 98 }, 99 "execution_count": 2, 100 "metadata": {}, 101 "output_type": "execute_result" 102 } 103 ], 104 "source": [ 105 "s = pd.Series([1,3,5,np.nan,6,8])\n", 106 "s" 107 ] 108 }, 109 { 110 "cell_type": "markdown", 111 "metadata": { 112 "slideshow": { 113 "slide_type": "slide" 114 } 115 }, 116 "source": [ 117 "### DataFrame\n", 118 "\n", 119 "Un objeto `DataFrame` representa una estructura tabular bi-dimensional que contiene datos potencialmente heterogéneos, con filas etiquetadas.\n", 120 "\n", 121 "Se pueden crear a partir de un diccionario, o de un `array` de NumPy." 122 ] 123 }, 124 { 125 "cell_type": "code", 126 "execution_count": 7, 127 "metadata": { 128 "slideshow": { 129 "slide_type": "slide" 130 } 131 }, 132 "outputs": [], 133 "source": [ 134 "df = pd.DataFrame({\n", 135 " 'A' : [1., 2., np.nan, None],\n", 136 " 'B' : pd.Timestamp('20130102'),\n", 137 " 'C' : pd.Series(1,index=list(range(4)),dtype='float32'),\n", 138 " 'D' : np.array([3] * 4,dtype='int32'),\n", 139 " 'E' : pd.Categorical([\"test\",\"train\",\"test\",\"train\"]),\n", 140 " 'F' : 'foo'\n", 141 "})" 142 ] 143 }, 144 { 145 "cell_type": "code", 146 "execution_count": 4, 147 "metadata": { 148 "slideshow": { 149 "slide_type": "slide" 150 } 151 }, 152 "outputs": [ 153 { 154 "data": { 155 "text/html": [ 156 "<div>\n", 157 "<style scoped>\n", 158 " .dataframe tbody tr th:only-of-type {\n", 159 " vertical-align: middle;\n", 160 " }\n", 161 "\n", 162 " .dataframe tbody tr th {\n", 163 " vertical-align: top;\n", 164 " }\n", 165 "\n", 166 " .dataframe thead th {\n", 167 " text-align: right;\n", 168 " }\n", 169 "</style>\n", 170 "<table border=\"1\" class=\"dataframe\">\n", 171 " <thead>\n", 172 " <tr style=\"text-align: right;\">\n", 173 " <th></th>\n", 174 " <th>A</th>\n", 175 " <th>B</th>\n", 176 " <th>C</th>\n", 177 " <th>D</th>\n", 178 " <th>E</th>\n", 179 " <th>F</th>\n", 180 " </tr>\n", 181 " </thead>\n", 182 " <tbody>\n", 183 " <tr>\n", 184 " <th>0</th>\n", 185 " <td>1.0</td>\n", 186 " <td>2013-01-02</td>\n", 187 " <td>1.0</td>\n", 188 " <td>3</td>\n", 189 " <td>test</td>\n", 190 " <td>foo</td>\n", 191 " </tr>\n", 192 " <tr>\n", 193 " <th>1</th>\n", 194 " <td>2.0</td>\n", 195 " <td>2013-01-02</td>\n", 196 " <td>1.0</td>\n", 197 " <td>3</td>\n", 198 " <td>train</td>\n", 199 " <td>foo</td>\n", 200 " </tr>\n", 201 " <tr>\n", 202 " <th>2</th>\n", 203 " <td>NaN</td>\n", 204 " <td>2013-01-02</td>\n", 205 " <td>1.0</td>\n", 206 " <td>3</td>\n", 207 " <td>test</td>\n", 208 " <td>foo</td>\n", 209 " </tr>\n", 210 " <tr>\n", 211 " <th>3</th>\n", 212 " <td>NaN</td>\n", 213 " <td>2013-01-02</td>\n", 214 " <td>1.0</td>\n", 215 " <td>3</td>\n", 216 " <td>train</td>\n", 217 " <td>foo</td>\n", 218 " </tr>\n", 219 " </tbody>\n", 220 "</table>\n", 221 "</div>" 222 ], 223 "text/plain": [ 224 " A B C D E F\n", 225 "0 1.0 2013-01-02 1.0 3 test foo\n", 226 "1 2.0 2013-01-02 1.0 3 train foo\n", 227 "2 NaN 2013-01-02 1.0 3 test foo\n", 228 "3 NaN 2013-01-02 1.0 3 train foo" 229 ] 230 }, 231 "execution_count": 4, 232 "metadata": {}, 233 "output_type": "execute_result" 234 } 235 ], 236 "source": [ 237 "df" 238 ] 239 }, 240 { 241 "cell_type": "code", 242 "execution_count": 8, 243 "metadata": { 244 "slideshow": { 245 "slide_type": "slide" 246 } 247 }, 248 "outputs": [ 249 { 250 "data": { 251 "text/plain": [ 252 "A float64\n", 253 "B datetime64[ns]\n", 254 "C float32\n", 255 "D int32\n", 256 "E category\n", 257 "F object\n", 258 "dtype: object" 259 ] 260 }, 261 "execution_count": 8, 262 "metadata": {}, 263 "output_type": "execute_result" 264 } 265 ], 266 "source": [ 267 "df.dtypes" 268 ] 269 }, 270 { 271 "cell_type": "markdown", 272 "metadata": { 273 "slideshow": { 274 "slide_type": "slide" 275 } 276 }, 277 "source": [ 278 "Podemos utilizar una serie para especificar la columna de índice." 279 ] 280 }, 281 { 282 "cell_type": "code", 283 "execution_count": 9, 284 "metadata": { 285 "slideshow": { 286 "slide_type": "slide" 287 } 288 }, 289 "outputs": [ 290 { 291 "data": { 292 "text/plain": [ 293 "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n", 294 " '2013-01-05', '2013-01-06'],\n", 295 " dtype='datetime64[ns]', freq='D')" 296 ] 297 }, 298 "execution_count": 9, 299 "metadata": {}, 300 "output_type": "execute_result" 301 } 302 ], 303 "source": [ 304 "dates = pd.date_range('20130101', periods=6)\n", 305 "dates" 306 ] 307 }, 308 { 309 "cell_type": "code", 310 "execution_count": 12, 311 "metadata": { 312 "slideshow": { 313 "slide_type": "slide" 314 } 315 }, 316 "outputs": [ 317 { 318 "data": { 319 "text/html": [ 320 "<div>\n", 321 "<style scoped>\n", 322 " .dataframe tbody tr th:only-of-type {\n", 323 " vertical-align: middle;\n", 324 " }\n", 325 "\n", 326 " .dataframe tbody tr th {\n", 327 " vertical-align: top;\n", 328 " }\n", 329 "\n", 330 " .dataframe thead th {\n", 331 " text-align: right;\n", 332 " }\n", 333 "</style>\n", 334 "<table border=\"1\" class=\"dataframe\">\n", 335 " <thead>\n", 336 " <tr style=\"text-align: right;\">\n", 337 " <th></th>\n", 338 " <th>A</th>\n", 339 " <th>B</th>\n", 340 " <th>C</th>\n", 341 " <th>D</th>\n", 342 " </tr>\n", 343 " </thead>\n", 344 " <tbody>\n", 345 " <tr>\n", 346 " <th>2013-01-01</th>\n", 347 " <td>-0.679399</td>\n", 348 " <td>-0.564244</td>\n", 349 " <td>-0.395166</td>\n", 350 " <td>-0.004622</td>\n", 351 " </tr>\n", 352 " <tr>\n", 353 " <th>2013-01-02</th>\n", 354 " <td>2.147829</td>\n", 355 " <td>-0.991826</td>\n", 356 " <td>-1.004833</td>\n", 357 " <td>0.168517</td>\n", 358 " </tr>\n", 359 " <tr>\n", 360 " <th>2013-01-03</th>\n", 361 " <td>0.398068</td>\n", 362 " <td>-0.536610</td>\n", 363 " <td>-0.773990</td>\n", 364 " <td>-1.075894</td>\n", 365 " </tr>\n", 366 " <tr>\n", 367 " <th>2013-01-04</th>\n", 368 " <td>-1.185011</td>\n", 369 " <td>1.988697</td>\n", 370 " <td>-0.770427</td>\n", 371 " <td>-0.472499</td>\n", 372 " </tr>\n", 373 " <tr>\n", 374 " <th>2013-01-05</th>\n", 375 " <td>-0.359634</td>\n", 376 " <td>0.338176</td>\n", 377 " <td>0.105786</td>\n", 378 " <td>0.359107</td>\n", 379 " </tr>\n", 380 " <tr>\n", 381 " <th>2013-01-06</th>\n", 382 " <td>-0.555880</td>\n", 383 " <td>1.115044</td>\n", 384 " <td>-2.108126</td>\n", 385 " <td>0.139896</td>\n", 386 " </tr>\n", 387 " </tbody>\n", 388 "</table>\n", 389 "</div>" 390 ], 391 "text/plain": [ 392 " A B C D\n", 393 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", 394 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", 395 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n", 396 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499\n", 397 "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n", 398 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896" 399 ] 400 }, 401 "execution_count": 12, 402 "metadata": {}, 403 "output_type": "execute_result" 404 } 405 ], 406 "source": [ 407 "df2 = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))\n", 408 "df2" 409 ] 410 }, 411 { 412 "cell_type": "markdown", 413 "metadata": { 414 "slideshow": { 415 "slide_type": "slide" 416 } 417 }, 418 "source": [ 419 "### Ejes\n", 420 "\n", 421 "En un DataFrame de `pandas` se pueden realizar operaciones a lo largo de los dos ejes, o `axis`.\n", 422 "\n", 423 "- Si en una operación especificamos `axis=0` nos referimos a loas índices, es decir, estaremos diciendo que la operación se realiza para todas las filas.\n", 424 "- Si en una operación especificamos `axis=1` estaremos diciendo que la operación se realiza para todas las columnas." 425 ] 426 }, 427 { 428 "cell_type": "markdown", 429 "metadata": { 430 "slideshow": { 431 "slide_type": "slide" 432 } 433 }, 434 "source": [ 435 "![Axis](./img/axis.jpg)" 436 ] 437 }, 438 { 439 "cell_type": "markdown", 440 "metadata": { 441 "slideshow": { 442 "slide_type": "slide" 443 } 444 }, 445 "source": [ 446 "### Visualización de datos" 447 ] 448 }, 449 { 450 "cell_type": "code", 451 "execution_count": 13, 452 "metadata": { 453 "slideshow": { 454 "slide_type": "slide" 455 } 456 }, 457 "outputs": [ 458 { 459 "data": { 460 "text/html": [ 461 "<div>\n", 462 "<style scoped>\n", 463 " .dataframe tbody tr th:only-of-type {\n", 464 " vertical-align: middle;\n", 465 " }\n", 466 "\n", 467 " .dataframe tbody tr th {\n", 468 " vertical-align: top;\n", 469 " }\n", 470 "\n", 471 " .dataframe thead th {\n", 472 " text-align: right;\n", 473 " }\n", 474 "</style>\n", 475 "<table border=\"1\" class=\"dataframe\">\n", 476 " <thead>\n", 477 " <tr style=\"text-align: right;\">\n", 478 " <th></th>\n", 479 " <th>A</th>\n", 480 " <th>B</th>\n", 481 " <th>C</th>\n", 482 " <th>D</th>\n", 483 " </tr>\n", 484 " </thead>\n", 485 " <tbody>\n", 486 " <tr>\n", 487 " <th>2013-01-01</th>\n", 488 " <td>-0.679399</td>\n", 489 " <td>-0.564244</td>\n", 490 " <td>-0.395166</td>\n", 491 " <td>-0.004622</td>\n", 492 " </tr>\n", 493 " <tr>\n", 494 " <th>2013-01-02</th>\n", 495 " <td>2.147829</td>\n", 496 " <td>-0.991826</td>\n", 497 " <td>-1.004833</td>\n", 498 " <td>0.168517</td>\n", 499 " </tr>\n", 500 " </tbody>\n", 501 "</table>\n", 502 "</div>" 503 ], 504 "text/plain": [ 505 " A B C D\n", 506 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", 507 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517" 508 ] 509 }, 510 "execution_count": 13, 511 "metadata": {}, 512 "output_type": "execute_result" 513 } 514 ], 515 "source": [ 516 "df2.head(2)" 517 ] 518 }, 519 { 520 "cell_type": "code", 521 "execution_count": 14, 522 "metadata": { 523 "slideshow": { 524 "slide_type": "slide" 525 } 526 }, 527 "outputs": [ 528 { 529 "data": { 530 "text/html": [ 531 "<div>\n", 532 "<style scoped>\n", 533 " .dataframe tbody tr th:only-of-type {\n", 534 " vertical-align: middle;\n", 535 " }\n", 536 "\n", 537 " .dataframe tbody tr th {\n", 538 " vertical-align: top;\n", 539 " }\n", 540 "\n", 541 " .dataframe thead th {\n", 542 " text-align: right;\n", 543 " }\n", 544 "</style>\n", 545 "<table border=\"1\" class=\"dataframe\">\n", 546 " <thead>\n", 547 " <tr style=\"text-align: right;\">\n", 548 " <th></th>\n", 549 " <th>A</th>\n", 550 " <th>B</th>\n", 551 " <th>C</th>\n", 552 " <th>D</th>\n", 553 " </tr>\n", 554 " </thead>\n", 555 " <tbody>\n", 556 " <tr>\n", 557 " <th>2013-01-05</th>\n", 558 " <td>-0.359634</td>\n", 559 " <td>0.338176</td>\n", 560 " <td>0.105786</td>\n", 561 " <td>0.359107</td>\n", 562 " </tr>\n", 563 " <tr>\n", 564 " <th>2013-01-06</th>\n", 565 " <td>-0.555880</td>\n", 566 " <td>1.115044</td>\n", 567 " <td>-2.108126</td>\n", 568 " <td>0.139896</td>\n", 569 " </tr>\n", 570 " </tbody>\n", 571 "</table>\n", 572 "</div>" 573 ], 574 "text/plain": [ 575 " A B C D\n", 576 "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n", 577 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896" 578 ] 579 }, 580 "execution_count": 14, 581 "metadata": {}, 582 "output_type": "execute_result" 583 } 584 ], 585 "source": [ 586 "df2.tail(2)" 587 ] 588 }, 589 { 590 "cell_type": "code", 591 "execution_count": 15, 592 "metadata": { 593 "slideshow": { 594 "slide_type": "slide" 595 } 596 }, 597 "outputs": [ 598 { 599 "data": { 600 "text/plain": [ 601 "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n", 602 " '2013-01-05', '2013-01-06'],\n", 603 " dtype='datetime64[ns]', freq='D')" 604 ] 605 }, 606 "execution_count": 15, 607 "metadata": {}, 608 "output_type": "execute_result" 609 } 610 ], 611 "source": [ 612 "df2.index" 613 ] 614 }, 615 { 616 "cell_type": "code", 617 "execution_count": 16, 618 "metadata": { 619 "slideshow": { 620 "slide_type": "slide" 621 } 622 }, 623 "outputs": [ 624 { 625 "data": { 626 "text/plain": [ 627 "Index(['A', 'B', 'C', 'D'], dtype='object')" 628 ] 629 }, 630 "execution_count": 16, 631 "metadata": {}, 632 "output_type": "execute_result" 633 } 634 ], 635 "source": [ 636 "df2.columns" 637 ] 638 }, 639 { 640 "cell_type": "markdown", 641 "metadata": { 642 "slideshow": { 643 "slide_type": "slide" 644 } 645 }, 646 "source": [ 647 "#### DataFrame.to_numpy()\n", 648 "\n", 649 "El método `.to_numpy()` de un DataFrame nos da una representación en una estructura de datos de `numpy` de los datos del DataFrame," 650 ] 651 }, 652 { 653 "cell_type": "code", 654 "execution_count": 18, 655 "metadata": { 656 "slideshow": { 657 "slide_type": "slide" 658 } 659 }, 660 "outputs": [ 661 { 662 "data": { 663 "text/plain": [ 664 "array([[-0.67939947, -0.5642441 , -0.39516608, -0.00462202],\n", 665 " [ 2.14782856, -0.99182561, -1.00483345, 0.16851747],\n", 666 " [ 0.39806756, -0.53661026, -0.77399033, -1.07589368],\n", 667 " [-1.18501088, 1.98869725, -0.77042661, -0.47249893],\n", 668 " [-0.35963418, 0.3381756 , 0.10578614, 0.35910665],\n", 669 " [-0.55588001, 1.11504445, -2.10812582, 0.13989579]])" 670 ] 671 }, 672 "execution_count": 18, 673 "metadata": {}, 674 "output_type": "execute_result" 675 } 676 ], 677 "source": [ 678 "df2.to_numpy()\n" 679 ] 680 }, 681 { 682 "cell_type": "markdown", 683 "metadata": { 684 "slideshow": { 685 "slide_type": "slide" 686 } 687 }, 688 "source": [ 689 "#### Describe\n", 690 "\n", 691 "El método `.describe()` nos muestra un resumen estadístico de los datos." 692 ] 693 }, 694 { 695 "cell_type": "code", 696 "execution_count": 19, 697 "metadata": { 698 "slideshow": { 699 "slide_type": "slide" 700 } 701 }, 702 "outputs": [ 703 { 704 "data": { 705 "text/html": [ 706 "<div>\n", 707 "<style scoped>\n", 708 " .dataframe tbody tr th:only-of-type {\n", 709 " vertical-align: middle;\n", 710 " }\n", 711 "\n", 712 " .dataframe tbody tr th {\n", 713 " vertical-align: top;\n", 714 " }\n", 715 "\n", 716 " .dataframe thead th {\n", 717 " text-align: right;\n", 718 " }\n", 719 "</style>\n", 720 "<table border=\"1\" class=\"dataframe\">\n", 721 " <thead>\n", 722 " <tr style=\"text-align: right;\">\n", 723 " <th></th>\n", 724 " <th>A</th>\n", 725 " <th>B</th>\n", 726 " <th>C</th>\n", 727 " <th>D</th>\n", 728 " </tr>\n", 729 " </thead>\n", 730 " <tbody>\n", 731 " <tr>\n", 732 " <th>count</th>\n", 733 " <td>6.000000</td>\n", 734 " <td>6.000000</td>\n", 735 " <td>6.000000</td>\n", 736 " <td>6.000000</td>\n", 737 " </tr>\n", 738 " <tr>\n", 739 " <th>mean</th>\n", 740 " <td>-0.039005</td>\n", 741 " <td>0.224873</td>\n", 742 " <td>-0.824459</td>\n", 743 " <td>-0.147582</td>\n", 744 " </tr>\n", 745 " <tr>\n", 746 " <th>std</th>\n", 747 " <td>1.188837</td>\n", 748 " <td>1.148846</td>\n", 749 " <td>0.739655</td>\n", 750 " <td>0.534241</td>\n", 751 " </tr>\n", 752 " <tr>\n", 753 " <th>min</th>\n", 754 " <td>-1.185011</td>\n", 755 " <td>-0.991826</td>\n", 756 " <td>-2.108126</td>\n", 757 " <td>-1.075894</td>\n", 758 " </tr>\n", 759 " <tr>\n", 760 " <th>25%</th>\n", 761 " <td>-0.648520</td>\n", 762 " <td>-0.557336</td>\n", 763 " <td>-0.947123</td>\n", 764 " <td>-0.355530</td>\n", 765 " </tr>\n", 766 " <tr>\n", 767 " <th>50%</th>\n", 768 " <td>-0.457757</td>\n", 769 " <td>-0.099217</td>\n", 770 " <td>-0.772208</td>\n", 771 " <td>0.067637</td>\n", 772 " </tr>\n", 773 " <tr>\n", 774 " <th>75%</th>\n", 775 " <td>0.208642</td>\n", 776 " <td>0.920827</td>\n", 777 " <td>-0.488981</td>\n", 778 " <td>0.161362</td>\n", 779 " </tr>\n", 780 " <tr>\n", 781 " <th>max</th>\n", 782 " <td>2.147829</td>\n", 783 " <td>1.988697</td>\n", 784 " <td>0.105786</td>\n", 785 " <td>0.359107</td>\n", 786 " </tr>\n", 787 " </tbody>\n", 788 "</table>\n", 789 "</div>" 790 ], 791 "text/plain": [ 792 " A B C D\n", 793 "count 6.000000 6.000000 6.000000 6.000000\n", 794 "mean -0.039005 0.224873 -0.824459 -0.147582\n", 795 "std 1.188837 1.148846 0.739655 0.534241\n", 796 "min -1.185011 -0.991826 -2.108126 -1.075894\n", 797 "25% -0.648520 -0.557336 -0.947123 -0.355530\n", 798 "50% -0.457757 -0.099217 -0.772208 0.067637\n", 799 "75% 0.208642 0.920827 -0.488981 0.161362\n", 800 "max 2.147829 1.988697 0.105786 0.359107" 801 ] 802 }, 803 "execution_count": 19, 804 "metadata": {}, 805 "output_type": "execute_result" 806 } 807 ], 808 "source": [ 809 "df2.describe()" 810 ] 811 }, 812 { 813 "cell_type": "markdown", 814 "metadata": { 815 "slideshow": { 816 "slide_type": "slide" 817 } 818 }, 819 "source": [ 820 "#### Transposición\n", 821 "\n", 822 "Podemos obtener el DataFrame transpuesto de uno dado a través del atributo `T`." 823 ] 824 }, 825 { 826 "cell_type": "code", 827 "execution_count": 20, 828 "metadata": { 829 "slideshow": { 830 "slide_type": "slide" 831 } 832 }, 833 "outputs": [ 834 { 835 "data": { 836 "text/html": [ 837 "<div>\n", 838 "<style scoped>\n", 839 " .dataframe tbody tr th:only-of-type {\n", 840 " vertical-align: middle;\n", 841 " }\n", 842 "\n", 843 " .dataframe tbody tr th {\n", 844 " vertical-align: top;\n", 845 " }\n", 846 "\n", 847 " .dataframe thead th {\n", 848 " text-align: right;\n", 849 " }\n", 850 "</style>\n", 851 "<table border=\"1\" class=\"dataframe\">\n", 852 " <thead>\n", 853 " <tr style=\"text-align: right;\">\n", 854 " <th></th>\n", 855 " <th>2013-01-01</th>\n", 856 " <th>2013-01-02</th>\n", 857 " <th>2013-01-03</th>\n", 858 " <th>2013-01-04</th>\n", 859 " <th>2013-01-05</th>\n", 860 " <th>2013-01-06</th>\n", 861 " </tr>\n", 862 " </thead>\n", 863 " <tbody>\n", 864 " <tr>\n", 865 " <th>A</th>\n", 866 " <td>-0.679399</td>\n", 867 " <td>2.147829</td>\n", 868 " <td>0.398068</td>\n", 869 " <td>-1.185011</td>\n", 870 " <td>-0.359634</td>\n", 871 " <td>-0.555880</td>\n", 872 " </tr>\n", 873 " <tr>\n", 874 " <th>B</th>\n", 875 " <td>-0.564244</td>\n", 876 " <td>-0.991826</td>\n", 877 " <td>-0.536610</td>\n", 878 " <td>1.988697</td>\n", 879 " <td>0.338176</td>\n", 880 " <td>1.115044</td>\n", 881 " </tr>\n", 882 " <tr>\n", 883 " <th>C</th>\n", 884 " <td>-0.395166</td>\n", 885 " <td>-1.004833</td>\n", 886 " <td>-0.773990</td>\n", 887 " <td>-0.770427</td>\n", 888 " <td>0.105786</td>\n", 889 " <td>-2.108126</td>\n", 890 " </tr>\n", 891 " <tr>\n", 892 " <th>D</th>\n", 893 " <td>-0.004622</td>\n", 894 " <td>0.168517</td>\n", 895 " <td>-1.075894</td>\n", 896 " <td>-0.472499</td>\n", 897 " <td>0.359107</td>\n", 898 " <td>0.139896</td>\n", 899 " </tr>\n", 900 " </tbody>\n", 901 "</table>\n", 902 "</div>" 903 ], 904 "text/plain": [ 905 " 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06\n", 906 "A -0.679399 2.147829 0.398068 -1.185011 -0.359634 -0.555880\n", 907 "B -0.564244 -0.991826 -0.536610 1.988697 0.338176 1.115044\n", 908 "C -0.395166 -1.004833 -0.773990 -0.770427 0.105786 -2.108126\n", 909 "D -0.004622 0.168517 -1.075894 -0.472499 0.359107 0.139896" 910 ] 911 }, 912 "execution_count": 20, 913 "metadata": {}, 914 "output_type": "execute_result" 915 } 916 ], 917 "source": [ 918 "df2.T" 919 ] 920 }, 921 { 922 "cell_type": "markdown", 923 "metadata": { 924 "slideshow": { 925 "slide_type": "slide" 926 } 927 }, 928 "source": [ 929 "#### Ordenación\n", 930 "\n", 931 "Podemos ordenar los datos por alguno de los ejes o por valores." 932 ] 933 }, 934 { 935 "cell_type": "code", 936 "execution_count": 21, 937 "metadata": { 938 "slideshow": { 939 "slide_type": "slide" 940 } 941 }, 942 "outputs": [ 943 { 944 "data": { 945 "text/html": [ 946 "<div>\n", 947 "<style scoped>\n", 948 " .dataframe tbody tr th:only-of-type {\n", 949 " vertical-align: middle;\n", 950 " }\n", 951 "\n", 952 " .dataframe tbody tr th {\n", 953 " vertical-align: top;\n", 954 " }\n", 955 "\n", 956 " .dataframe thead th {\n", 957 " text-align: right;\n", 958 " }\n", 959 "</style>\n", 960 "<table border=\"1\" class=\"dataframe\">\n", 961 " <thead>\n", 962 " <tr style=\"text-align: right;\">\n", 963 " <th></th>\n", 964 " <th>D</th>\n", 965 " <th>C</th>\n", 966 " <th>B</th>\n", 967 " <th>A</th>\n", 968 " </tr>\n", 969 " </thead>\n", 970 " <tbody>\n", 971 " <tr>\n", 972 " <th>2013-01-01</th>\n", 973 " <td>-0.004622</td>\n", 974 " <td>-0.395166</td>\n", 975 " <td>-0.564244</td>\n", 976 " <td>-0.679399</td>\n", 977 " </tr>\n", 978 " <tr>\n", 979 " <th>2013-01-02</th>\n", 980 " <td>0.168517</td>\n", 981 " <td>-1.004833</td>\n", 982 " <td>-0.991826</td>\n", 983 " <td>2.147829</td>\n", 984 " </tr>\n", 985 " <tr>\n", 986 " <th>2013-01-03</th>\n", 987 " <td>-1.075894</td>\n", 988 " <td>-0.773990</td>\n", 989 " <td>-0.536610</td>\n", 990 " <td>0.398068</td>\n", 991 " </tr>\n", 992 " <tr>\n", 993 " <th>2013-01-04</th>\n", 994 " <td>-0.472499</td>\n", 995 " <td>-0.770427</td>\n", 996 " <td>1.988697</td>\n", 997 " <td>-1.185011</td>\n", 998 " </tr>\n", 999 " <tr>\n", 1000 " <th>2013-01-05</th>\n", 1001 " <td>0.359107</td>\n", 1002 " <td>0.105786</td>\n", 1003 " <td>0.338176</td>\n", 1004 " <td>-0.359634</td>\n", 1005 " </tr>\n", 1006 " <tr>\n", 1007 " <th>2013-01-06</th>\n", 1008 " <td>0.139896</td>\n", 1009 " <td>-2.108126</td>\n", 1010 " <td>1.115044</td>\n", 1011 " <td>-0.555880</td>\n", 1012 " </tr>\n", 1013 " </tbody>\n", 1014 "</table>\n", 1015 "</div>" 1016 ], 1017 "text/plain": [ 1018 " D C B A\n", 1019 "2013-01-01 -0.004622 -0.395166 -0.564244 -0.679399\n", 1020 "2013-01-02 0.168517 -1.004833 -0.991826 2.147829\n", 1021 "2013-01-03 -1.075894 -0.773990 -0.536610 0.398068\n", 1022 "2013-01-04 -0.472499 -0.770427 1.988697 -1.185011\n", 1023 "2013-01-05 0.359107 0.105786 0.338176 -0.359634\n", 1024 "2013-01-06 0.139896 -2.108126 1.115044 -0.555880" 1025 ] 1026 }, 1027 "execution_count": 21, 1028 "metadata": {}, 1029 "output_type": "execute_result" 1030 } 1031 ], 1032 "source": [ 1033 "df2.sort_index(axis=1, ascending=False)" 1034 ] 1035 }, 1036 { 1037 "cell_type": "code", 1038 "execution_count": 22, 1039 "metadata": { 1040 "slideshow": { 1041 "slide_type": "slide" 1042 } 1043 }, 1044 "outputs": [ 1045 { 1046 "data": { 1047 "text/html": [ 1048 "<div>\n", 1049 "<style scoped>\n", 1050 " .dataframe tbody tr th:only-of-type {\n", 1051 " vertical-align: middle;\n", 1052 " }\n", 1053 "\n", 1054 " .dataframe tbody tr th {\n", 1055 " vertical-align: top;\n", 1056 " }\n", 1057 "\n", 1058 " .dataframe thead th {\n", 1059 " text-align: right;\n", 1060 " }\n", 1061 "</style>\n", 1062 "<table border=\"1\" class=\"dataframe\">\n", 1063 " <thead>\n", 1064 " <tr style=\"text-align: right;\">\n", 1065 " <th></th>\n", 1066 " <th>A</th>\n", 1067 " <th>B</th>\n", 1068 " <th>C</th>\n", 1069 " <th>D</th>\n", 1070 " </tr>\n", 1071 " </thead>\n", 1072 " <tbody>\n", 1073 " <tr>\n", 1074 " <th>2013-01-02</th>\n", 1075 " <td>2.147829</td>\n", 1076 " <td>-0.991826</td>\n", 1077 " <td>-1.004833</td>\n", 1078 " <td>0.168517</td>\n", 1079 " </tr>\n", 1080 " <tr>\n", 1081 " <th>2013-01-01</th>\n", 1082 " <td>-0.679399</td>\n", 1083 " <td>-0.564244</td>\n", 1084 " <td>-0.395166</td>\n", 1085 " <td>-0.004622</td>\n", 1086 " </tr>\n", 1087 " <tr>\n", 1088 " <th>2013-01-03</th>\n", 1089 " <td>0.398068</td>\n", 1090 " <td>-0.536610</td>\n", 1091 " <td>-0.773990</td>\n", 1092 " <td>-1.075894</td>\n", 1093 " </tr>\n", 1094 " <tr>\n", 1095 " <th>2013-01-05</th>\n", 1096 " <td>-0.359634</td>\n", 1097 " <td>0.338176</td>\n", 1098 " <td>0.105786</td>\n", 1099 " <td>0.359107</td>\n", 1100 " </tr>\n", 1101 " <tr>\n", 1102 " <th>2013-01-06</th>\n", 1103 " <td>-0.555880</td>\n", 1104 " <td>1.115044</td>\n", 1105 " <td>-2.108126</td>\n", 1106 " <td>0.139896</td>\n", 1107 " </tr>\n", 1108 " <tr>\n", 1109 " <th>2013-01-04</th>\n", 1110 " <td>-1.185011</td>\n", 1111 " <td>1.988697</td>\n", 1112 " <td>-0.770427</td>\n", 1113 " <td>-0.472499</td>\n", 1114 " </tr>\n", 1115 " </tbody>\n", 1116 "</table>\n", 1117 "</div>" 1118 ], 1119 "text/plain": [ 1120 " A B C D\n", 1121 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", 1122 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", 1123 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n", 1124 "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n", 1125 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896\n", 1126 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499" 1127 ] 1128 }, 1129 "execution_count": 22, 1130 "metadata": {}, 1131 "output_type": "execute_result" 1132 } 1133 ], 1134 "source": [ 1135 "df2.sort_values(by='B')" 1136 ] 1137 }, 1138 { 1139 "cell_type": "markdown", 1140 "metadata": { 1141 "slideshow": { 1142 "slide_type": "slide" 1143 } 1144 }, 1145 "source": [ 1146 "### Selección \n", 1147 "\n", 1148 "Podemos obtener una selección de los datos usando los métodos estándar de Python o `numpy` para la obtener *slices* en listas o matrices." 1149 ] 1150 }, 1151 { 1152 "cell_type": "markdown", 1153 "metadata": { 1154 "slideshow": { 1155 "slide_type": "fragment" 1156 } 1157 }, 1158 "source": [ 1159 "Además, `pandas` proporcia métodos especializados (y optimizados) para el acceso a los datos:" 1160 ] 1161 }, 1162 { 1163 "cell_type": "markdown", 1164 "metadata": { 1165 "slideshow": { 1166 "slide_type": "subslide" 1167 } 1168 }, 1169 "source": [ 1170 "`.loc`\n", 1171 "\n", 1172 "Se utiliza principalmente para acceder por etiqueta. Soporta los siguietnes tipos de entradas:\n", 1173 "\n", 1174 "- Una etiqueta única: df.loc['a']\n", 1175 "- Una lista o array de etiqueta: df.loc[['a', 'b', 'c']]\n", 1176 "- Un *slice* con etiquetas: df.loc[a':'f']" 1177 ] 1178 }, 1179 { 1180 "cell_type": "markdown", 1181 "metadata": { 1182 "slideshow": { 1183 "slide_type": "subslide" 1184 } 1185 }, 1186 "source": [ 1187 "`.iloc`\n", 1188 "\n", 1189 "Se utiliza principalmente para acceder posición. Soporta los siguietnes tipos de entradas:\n", 1190 "\n", 1191 "- Una entero: df.iloc[0]\n", 1192 "- Una lista o array de enteros: df.iloc[[0, 1, 2]]\n", 1193 "- Un *slice* : df.loc[1:3]" 1194 ] 1195 }, 1196 { 1197 "cell_type": "markdown", 1198 "metadata": { 1199 "slideshow": { 1200 "slide_type": "slide" 1201 } 1202 }, 1203 "source": [ 1204 "\n", 1205 "Tipo de objeto | Selección | Valor retornado\n", 1206 "---------------|----------------|-------------------------------------\n", 1207 "Series | series[label] | valor escalar\n", 1208 "DataFrame | frame[colname] | La serie correspondiente a `colname`" 1209 ] 1210 }, 1211 { 1212 "cell_type": "code", 1213 "execution_count": 23, 1214 "metadata": { 1215 "slideshow": { 1216 "slide_type": "slide" 1217 } 1218 }, 1219 "outputs": [ 1220 { 1221 "data": { 1222 "text/plain": [ 1223 "2013-01-01 -0.679399\n", 1224 "2013-01-02 2.147829\n", 1225 "2013-01-03 0.398068\n", 1226 "2013-01-04 -1.185011\n", 1227 "2013-01-05 -0.359634\n", 1228 "2013-01-06 -0.555880\n", 1229 "Freq: D, Name: A, dtype: float64" 1230 ] 1231 }, 1232 "execution_count": 23, 1233 "metadata": {}, 1234 "output_type": "execute_result" 1235 } 1236 ], 1237 "source": [ 1238 "df2['A']" 1239 ] 1240 }, 1241 { 1242 "cell_type": "code", 1243 "execution_count": 24, 1244 "metadata": { 1245 "slideshow": { 1246 "slide_type": "fragment" 1247 } 1248 }, 1249 "outputs": [ 1250 { 1251 "data": { 1252 "text/plain": [ 1253 "2013-01-01 -0.679399\n", 1254 "2013-01-02 2.147829\n", 1255 "2013-01-03 0.398068\n", 1256 "2013-01-04 -1.185011\n", 1257 "2013-01-05 -0.359634\n", 1258 "2013-01-06 -0.555880\n", 1259 "Freq: D, Name: A, dtype: float64" 1260 ] 1261 }, 1262 "execution_count": 24, 1263 "metadata": {}, 1264 "output_type": "execute_result" 1265 } 1266 ], 1267 "source": [ 1268 "df2.A" 1269 ] 1270 }, 1271 { 1272 "cell_type": "code", 1273 "execution_count": 25, 1274 "metadata": { 1275 "slideshow": { 1276 "slide_type": "slide" 1277 } 1278 }, 1279 "outputs": [ 1280 { 1281 "data": { 1282 "text/html": [ 1283 "<div>\n", 1284 "<style scoped>\n", 1285 " .dataframe tbody tr th:only-of-type {\n", 1286 " vertical-align: middle;\n", 1287 " }\n", 1288 "\n", 1289 " .dataframe tbody tr th {\n", 1290 " vertical-align: top;\n", 1291 " }\n", 1292 "\n", 1293 " .dataframe thead th {\n", 1294 " text-align: right;\n", 1295 " }\n", 1296 "</style>\n", 1297 "<table border=\"1\" class=\"dataframe\">\n", 1298 " <thead>\n", 1299 " <tr style=\"text-align: right;\">\n", 1300 " <th></th>\n", 1301 " <th>A</th>\n", 1302 " <th>B</th>\n", 1303 " <th>C</th>\n", 1304 " <th>D</th>\n", 1305 " </tr>\n", 1306 " </thead>\n", 1307 " <tbody>\n", 1308 " <tr>\n", 1309 " <th>2013-01-01</th>\n", 1310 " <td>-0.679399</td>\n", 1311 " <td>-0.564244</td>\n", 1312 " <td>-0.395166</td>\n", 1313 " <td>-0.004622</td>\n", 1314 " </tr>\n", 1315 " <tr>\n", 1316 " <th>2013-01-02</th>\n", 1317 " <td>2.147829</td>\n", 1318 " <td>-0.991826</td>\n", 1319 " <td>-1.004833</td>\n", 1320 " <td>0.168517</td>\n", 1321 " </tr>\n", 1322 " <tr>\n", 1323 " <th>2013-01-03</th>\n", 1324 " <td>0.398068</td>\n", 1325 " <td>-0.536610</td>\n", 1326 " <td>-0.773990</td>\n", 1327 " <td>-1.075894</td>\n", 1328 " </tr>\n", 1329 " </tbody>\n", 1330 "</table>\n", 1331 "</div>" 1332 ], 1333 "text/plain": [ 1334 " A B C D\n", 1335 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", 1336 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", 1337 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894" 1338 ] 1339 }, 1340 "execution_count": 25, 1341 "metadata": {}, 1342 "output_type": "execute_result" 1343 } 1344 ], 1345 "source": [ 1346 "df2[0:3]" 1347 ] 1348 }, 1349 { 1350 "cell_type": "code", 1351 "execution_count": 26, 1352 "metadata": { 1353 "slideshow": { 1354 "slide_type": "fragment" 1355 } 1356 }, 1357 "outputs": [ 1358 { 1359 "data": { 1360 "text/html": [ 1361 "<div>\n", 1362 "<style scoped>\n", 1363 " .dataframe tbody tr th:only-of-type {\n", 1364 " vertical-align: middle;\n", 1365 " }\n", 1366 "\n", 1367 " .dataframe tbody tr th {\n", 1368 " vertical-align: top;\n", 1369 " }\n", 1370 "\n", 1371 " .dataframe thead th {\n", 1372 " text-align: right;\n", 1373 " }\n", 1374 "</style>\n", 1375 "<table border=\"1\" class=\"dataframe\">\n", 1376 " <thead>\n", 1377 " <tr style=\"text-align: right;\">\n", 1378 " <th></th>\n", 1379 " <th>A</th>\n", 1380 " <th>B</th>\n", 1381 " <th>C</th>\n", 1382 " <th>D</th>\n", 1383 " </tr>\n", 1384 " </thead>\n", 1385 " <tbody>\n", 1386 " <tr>\n", 1387 " <th>2013-01-02</th>\n", 1388 " <td>2.147829</td>\n", 1389 " <td>-0.991826</td>\n", 1390 " <td>-1.004833</td>\n", 1391 " <td>0.168517</td>\n", 1392 " </tr>\n", 1393 " <tr>\n", 1394 " <th>2013-01-03</th>\n", 1395 " <td>0.398068</td>\n", 1396 " <td>-0.536610</td>\n", 1397 " <td>-0.773990</td>\n", 1398 " <td>-1.075894</td>\n", 1399 " </tr>\n", 1400 " <tr>\n", 1401 " <th>2013-01-04</th>\n", 1402 " <td>-1.185011</td>\n", 1403 " <td>1.988697</td>\n", 1404 " <td>-0.770427</td>\n", 1405 " <td>-0.472499</td>\n", 1406 " </tr>\n", 1407 " </tbody>\n", 1408 "</table>\n", 1409 "</div>" 1410 ], 1411 "text/plain": [ 1412 " A B C D\n", 1413 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", 1414 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n", 1415 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499" 1416 ] 1417 }, 1418 "execution_count": 26, 1419 "metadata": {}, 1420 "output_type": "execute_result" 1421 } 1422 ], 1423 "source": [ 1424 "df2['20130102':'20130104']" 1425 ] 1426 }, 1427 { 1428 "cell_type": "code", 1429 "execution_count": 27, 1430 "metadata": { 1431 "slideshow": { 1432 "slide_type": "fragment" 1433 } 1434 }, 1435 "outputs": [ 1436 { 1437 "data": { 1438 "text/plain": [ 1439 "A -0.679399\n", 1440 "B -0.564244\n", 1441 "C -0.395166\n", 1442 "D -0.004622\n", 1443 "Name: 2013-01-01 00:00:00, dtype: float64" 1444 ] 1445 }, 1446 "execution_count": 27, 1447 "metadata": {}, 1448 "output_type": "execute_result" 1449 } 1450 ], 1451 "source": [ 1452 "df2.loc[dates[0]]" 1453 ] 1454 }, 1455 { 1456 "cell_type": "code", 1457 "execution_count": 28, 1458 "metadata": { 1459 "slideshow": { 1460 "slide_type": "slide" 1461 } 1462 }, 1463 "outputs": [ 1464 { 1465 "data": { 1466 "text/html": [ 1467 "<div>\n", 1468 "<style scoped>\n", 1469 " .dataframe tbody tr th:only-of-type {\n", 1470 " vertical-align: middle;\n", 1471 " }\n", 1472 "\n", 1473 " .dataframe tbody tr th {\n", 1474 " vertical-align: top;\n", 1475 " }\n", 1476 "\n", 1477 " .dataframe thead th {\n", 1478 " text-align: right;\n", 1479 " }\n", 1480 "</style>\n", 1481 "<table border=\"1\" class=\"dataframe\">\n", 1482 " <thead>\n", 1483 " <tr style=\"text-align: right;\">\n", 1484 " <th></th>\n", 1485 " <th>A</th>\n", 1486 " <th>B</th>\n", 1487 " </tr>\n", 1488 " </thead>\n", 1489 " <tbody>\n", 1490 " <tr>\n", 1491 " <th>2013-01-01</th>\n", 1492 " <td>-0.679399</td>\n", 1493 " <td>-0.564244</td>\n", 1494 " </tr>\n", 1495 " <tr>\n", 1496 " <th>2013-01-02</th>\n", 1497 " <td>2.147829</td>\n", 1498 " <td>-0.991826</td>\n", 1499 " </tr>\n", 1500 " <tr>\n", 1501 " <th>2013-01-03</th>\n", 1502 " <td>0.398068</td>\n", 1503 " <td>-0.536610</td>\n", 1504 " </tr>\n", 1505 " <tr>\n", 1506 " <th>2013-01-04</th>\n", 1507 " <td>-1.185011</td>\n", 1508 " <td>1.988697</td>\n", 1509 " </tr>\n", 1510 " <tr>\n", 1511 " <th>2013-01-05</th>\n", 1512 " <td>-0.359634</td>\n", 1513 " <td>0.338176</td>\n", 1514 " </tr>\n", 1515 " <tr>\n", 1516 " <th>2013-01-06</th>\n", 1517 " <td>-0.555880</td>\n", 1518 " <td>1.115044</td>\n", 1519 " </tr>\n", 1520 " </tbody>\n", 1521 "</table>\n", 1522 "</div>" 1523 ], 1524 "text/plain": [ 1525 " A B\n", 1526 "2013-01-01 -0.679399 -0.564244\n", 1527 "2013-01-02 2.147829 -0.991826\n", 1528 "2013-01-03 0.398068 -0.536610\n", 1529 "2013-01-04 -1.185011 1.988697\n", 1530 "2013-01-05 -0.359634 0.338176\n", 1531 "2013-01-06 -0.555880 1.115044" 1532 ] 1533 }, 1534 "execution_count": 28, 1535 "metadata": {}, 1536 "output_type": "execute_result" 1537 } 1538 ], 1539 "source": [ 1540 "df2.loc[:, ['A', 'B']]" 1541 ] 1542 }, 1543 { 1544 "cell_type": "code", 1545 "execution_count": 29, 1546 "metadata": { 1547 "slideshow": { 1548 "slide_type": "slide" 1549 } 1550 }, 1551 "outputs": [ 1552 { 1553 "data": { 1554 "text/html": [ 1555 "<div>\n", 1556 "<style scoped>\n", 1557 " .dataframe tbody tr th:only-of-type {\n", 1558 " vertical-align: middle;\n", 1559 " }\n", 1560 "\n", 1561 " .dataframe tbody tr th {\n", 1562 " vertical-align: top;\n", 1563 " }\n", 1564 "\n", 1565 " .dataframe thead th {\n", 1566 " text-align: right;\n", 1567 " }\n", 1568 "</style>\n", 1569 "<table border=\"1\" class=\"dataframe\">\n", 1570 " <thead>\n", 1571 " <tr style=\"text-align: right;\">\n", 1572 " <th></th>\n", 1573 " <th>A</th>\n", 1574 " <th>B</th>\n", 1575 " </tr>\n", 1576 " </thead>\n", 1577 " <tbody>\n", 1578 " <tr>\n", 1579 " <th>2013-01-02</th>\n", 1580 " <td>2.147829</td>\n", 1581 " <td>-0.991826</td>\n", 1582 " </tr>\n", 1583 " <tr>\n", 1584 " <th>2013-01-03</th>\n", 1585 " <td>0.398068</td>\n", 1586 " <td>-0.536610</td>\n", 1587 " </tr>\n", 1588 " <tr>\n", 1589 " <th>2013-01-04</th>\n", 1590 " <td>-1.185011</td>\n", 1591 " <td>1.988697</td>\n", 1592 " </tr>\n", 1593 " </tbody>\n", 1594 "</table>\n", 1595 "</div>" 1596 ], 1597 "text/plain": [ 1598 " A B\n", 1599 "2013-01-02 2.147829 -0.991826\n", 1600 "2013-01-03 0.398068 -0.536610\n", 1601 "2013-01-04 -1.185011 1.988697" 1602 ] 1603 }, 1604 "execution_count": 29, 1605 "metadata": {}, 1606 "output_type": "execute_result" 1607 } 1608 ], 1609 "source": [ 1610 "df2.loc['20130102':'20130104', ['A', 'B']]" 1611 ] 1612 }, 1613 { 1614 "cell_type": "code", 1615 "execution_count": 30, 1616 "metadata": { 1617 "slideshow": { 1618 "slide_type": "slide" 1619 } 1620 }, 1621 "outputs": [ 1622 { 1623 "data": { 1624 "text/plain": [ 1625 "A 2.147829\n", 1626 "B -0.991826\n", 1627 "Name: 2013-01-02 00:00:00, dtype: float64" 1628 ] 1629 }, 1630 "execution_count": 30, 1631 "metadata": {}, 1632 "output_type": "execute_result" 1633 } 1634 ], 1635 "source": [ 1636 "df2.loc['20130102', ['A', 'B']]" 1637 ] 1638 }, 1639 { 1640 "cell_type": "code", 1641 "execution_count": 31, 1642 "metadata": { 1643 "slideshow": { 1644 "slide_type": "slide" 1645 } 1646 }, 1647 "outputs": [ 1648 { 1649 "data": { 1650 "text/plain": [ 1651 "A -1.185011\n", 1652 "B 1.988697\n", 1653 "C -0.770427\n", 1654 "D -0.472499\n", 1655 "Name: 2013-01-04 00:00:00, dtype: float64" 1656 ] 1657 }, 1658 "execution_count": 31, 1659 "metadata": {}, 1660 "output_type": "execute_result" 1661 } 1662 ], 1663 "source": [ 1664 "df2.iloc[3]" 1665 ] 1666 }, 1667 { 1668 "cell_type": "code", 1669 "execution_count": 32, 1670 "metadata": { 1671 "slideshow": { 1672 "slide_type": "slide" 1673 } 1674 }, 1675 "outputs": [ 1676 { 1677 "data": { 1678 "text/html": [ 1679 "<div>\n", 1680 "<style scoped>\n", 1681 " .dataframe tbody tr th:only-of-type {\n", 1682 " vertical-align: middle;\n", 1683 " }\n", 1684 "\n", 1685 " .dataframe tbody tr th {\n", 1686 " vertical-align: top;\n", 1687 " }\n", 1688 "\n", 1689 " .dataframe thead th {\n", 1690 " text-align: right;\n", 1691 " }\n", 1692 "</style>\n", 1693 "<table border=\"1\" class=\"dataframe\">\n", 1694 " <thead>\n", 1695 " <tr style=\"text-align: right;\">\n", 1696 " <th></th>\n", 1697 " <th>A</th>\n", 1698 " <th>B</th>\n", 1699 " </tr>\n", 1700 " </thead>\n", 1701 " <tbody>\n", 1702 " <tr>\n", 1703 " <th>2013-01-04</th>\n", 1704 " <td>-1.185011</td>\n", 1705 " <td>1.988697</td>\n", 1706 " </tr>\n", 1707 " <tr>\n", 1708 " <th>2013-01-05</th>\n", 1709 " <td>-0.359634</td>\n", 1710 " <td>0.338176</td>\n", 1711 " </tr>\n", 1712 " </tbody>\n", 1713 "</table>\n", 1714 "</div>" 1715 ], 1716 "text/plain": [ 1717 " A B\n", 1718 "2013-01-04 -1.185011 1.988697\n", 1719 "2013-01-05 -0.359634 0.338176" 1720 ] 1721 }, 1722 "execution_count": 32, 1723 "metadata": {}, 1724 "output_type": "execute_result" 1725 } 1726 ], 1727 "source": [ 1728 "df2.iloc[3:5, 0:2]" 1729 ] 1730 }, 1731 { 1732 "cell_type": "markdown", 1733 "metadata": { 1734 "slideshow": { 1735 "slide_type": "slide" 1736 } 1737 }, 1738 "source": [ 1739 "#### Indexación condicional\n", 1740 "\n", 1741 "Se puede acceder a las columnas que cumplan una condición concreta, indicando la condición en el selector." 1742 ] 1743 }, 1744 { 1745 "cell_type": "code", 1746 "execution_count": 33, 1747 "metadata": { 1748 "slideshow": { 1749 "slide_type": "slide" 1750 } 1751 }, 1752 "outputs": [ 1753 { 1754 "data": { 1755 "text/html": [ 1756 "<div>\n", 1757 "<style scoped>\n", 1758 " .dataframe tbody tr th:only-of-type {\n", 1759 " vertical-align: middle;\n", 1760 " }\n", 1761 "\n", 1762 " .dataframe tbody tr th {\n", 1763 " vertical-align: top;\n", 1764 " }\n", 1765 "\n", 1766 " .dataframe thead th {\n", 1767 " text-align: right;\n", 1768 " }\n", 1769 "</style>\n", 1770 "<table border=\"1\" class=\"dataframe\">\n", 1771 " <thead>\n", 1772 " <tr style=\"text-align: right;\">\n", 1773 " <th></th>\n", 1774 " <th>A</th>\n", 1775 " <th>B</th>\n", 1776 " <th>C</th>\n", 1777 " <th>D</th>\n", 1778 " </tr>\n", 1779 " </thead>\n", 1780 " <tbody>\n", 1781 " <tr>\n", 1782 " <th>2013-01-02</th>\n", 1783 " <td>2.147829</td>\n", 1784 " <td>-0.991826</td>\n", 1785 " <td>-1.004833</td>\n", 1786 " <td>0.168517</td>\n", 1787 " </tr>\n", 1788 " <tr>\n", 1789 " <th>2013-01-03</th>\n", 1790 " <td>0.398068</td>\n", 1791 " <td>-0.536610</td>\n", 1792 " <td>-0.773990</td>\n", 1793 " <td>-1.075894</td>\n", 1794 " </tr>\n", 1795 " </tbody>\n", 1796 "</table>\n", 1797 "</div>" 1798 ], 1799 "text/plain": [ 1800 " A B C D\n", 1801 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", 1802 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894" 1803 ] 1804 }, 1805 "execution_count": 33, 1806 "metadata": {}, 1807 "output_type": "execute_result" 1808 } 1809 ], 1810 "source": [ 1811 "df2[df2.A > 0]" 1812 ] 1813 }, 1814 { 1815 "cell_type": "markdown", 1816 "metadata": { 1817 "slideshow": { 1818 "slide_type": "slide" 1819 } 1820 }, 1821 "source": [ 1822 "### Operaciones" 1823 ] 1824 }, 1825 { 1826 "cell_type": "markdown", 1827 "metadata": { 1828 "slideshow": { 1829 "slide_type": "slide" 1830 } 1831 }, 1832 "source": [ 1833 "Se pueden realizar operaciones estadísticas básicas llamando a los métodos correspondientes." 1834 ] 1835 }, 1836 { 1837 "cell_type": "code", 1838 "execution_count": 34, 1839 "metadata": { 1840 "slideshow": { 1841 "slide_type": "fragment" 1842 } 1843 }, 1844 "outputs": [ 1845 { 1846 "data": { 1847 "text/plain": [ 1848 "A -0.039005\n", 1849 "B 0.224873\n", 1850 "C -0.824459\n", 1851 "D -0.147582\n", 1852 "dtype: float64" 1853 ] 1854 }, 1855 "execution_count": 34, 1856 "metadata": {}, 1857 "output_type": "execute_result" 1858 } 1859 ], 1860 "source": [ 1861 "df2.mean()" 1862 ] 1863 }, 1864 { 1865 "cell_type": "code", 1866 "execution_count": 35, 1867 "metadata": { 1868 "slideshow": { 1869 "slide_type": "fragment" 1870 } 1871 }, 1872 "outputs": [ 1873 { 1874 "data": { 1875 "text/plain": [ 1876 "-0.03900473868952752" 1877 ] 1878 }, 1879 "execution_count": 35, 1880 "metadata": {}, 1881 "output_type": "execute_result" 1882 } 1883 ], 1884 "source": [ 1885 "df2['A'].mean()" 1886 ] 1887 }, 1888 { 1889 "cell_type": "code", 1890 "execution_count": 36, 1891 "metadata": { 1892 "slideshow": { 1893 "slide_type": "slide" 1894 } 1895 }, 1896 "outputs": [ 1897 { 1898 "data": { 1899 "text/plain": [ 1900 "2013-01-01 -0.410858\n", 1901 "2013-01-02 0.079922\n", 1902 "2013-01-03 -0.497107\n", 1903 "2013-01-04 -0.109810\n", 1904 "2013-01-05 0.110859\n", 1905 "2013-01-06 -0.352266\n", 1906 "Freq: D, dtype: float64" 1907 ] 1908 }, 1909 "execution_count": 36, 1910 "metadata": {}, 1911 "output_type": "execute_result" 1912 } 1913 ], 1914 "source": [ 1915 "df2.mean(axis=1)" 1916 ] 1917 }, 1918 { 1919 "cell_type": "markdown", 1920 "metadata": { 1921 "slideshow": { 1922 "slide_type": "slide" 1923 } 1924 }, 1925 "source": [ 1926 "Se pueden aplicar funciones a los datos." 1927 ] 1928 }, 1929 { 1930 "cell_type": "code", 1931 "execution_count": 37, 1932 "metadata": { 1933 "slideshow": { 1934 "slide_type": "slide" 1935 } 1936 }, 1937 "outputs": [ 1938 { 1939 "data": { 1940 "text/html": [ 1941 "<div>\n", 1942 "<style scoped>\n", 1943 " .dataframe tbody tr th:only-of-type {\n", 1944 " vertical-align: middle;\n", 1945 " }\n", 1946 "\n", 1947 " .dataframe tbody tr th {\n", 1948 " vertical-align: top;\n", 1949 " }\n", 1950 "\n", 1951 " .dataframe thead th {\n", 1952 " text-align: right;\n", 1953 " }\n", 1954 "</style>\n", 1955 "<table border=\"1\" class=\"dataframe\">\n", 1956 " <thead>\n", 1957 " <tr style=\"text-align: right;\">\n", 1958 " <th></th>\n", 1959 " <th>A</th>\n", 1960 " <th>B</th>\n", 1961 " <th>C</th>\n", 1962 " <th>D</th>\n", 1963 " </tr>\n", 1964 " </thead>\n", 1965 " <tbody>\n", 1966 " <tr>\n", 1967 " <th>2013-01-01</th>\n", 1968 " <td>-0.679399</td>\n", 1969 " <td>-0.564244</td>\n", 1970 " <td>-0.395166</td>\n", 1971 " <td>-0.004622</td>\n", 1972 " </tr>\n", 1973 " <tr>\n", 1974 " <th>2013-01-02</th>\n", 1975 " <td>1.468429</td>\n", 1976 " <td>-1.556070</td>\n", 1977 " <td>-1.400000</td>\n", 1978 " <td>0.163895</td>\n", 1979 " </tr>\n", 1980 " <tr>\n", 1981 " <th>2013-01-03</th>\n", 1982 " <td>1.866497</td>\n", 1983 " <td>-2.092680</td>\n", 1984 " <td>-2.173990</td>\n", 1985 " <td>-0.911998</td>\n", 1986 " </tr>\n", 1987 " <tr>\n", 1988 " <th>2013-01-04</th>\n", 1989 " <td>0.681486</td>\n", 1990 " <td>-0.103983</td>\n", 1991 " <td>-2.944416</td>\n", 1992 " <td>-1.384497</td>\n", 1993 " </tr>\n", 1994 " <tr>\n", 1995 " <th>2013-01-05</th>\n", 1996 " <td>0.321852</td>\n", 1997 " <td>0.234193</td>\n", 1998 " <td>-2.838630</td>\n", 1999 " <td>-1.025391</td>\n", 2000 " </tr>\n", 2001 " <tr>\n", 2002 " <th>2013-01-06</th>\n", 2003 " <td>-0.234028</td>\n", 2004 " <td>1.349237</td>\n", 2005 " <td>-4.946756</td>\n", 2006 " <td>-0.885495</td>\n", 2007 " </tr>\n", 2008 " </tbody>\n", 2009 "</table>\n", 2010 "</div>" 2011 ], 2012 "text/plain": [ 2013 " A B C D\n", 2014 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", 2015 "2013-01-02 1.468429 -1.556070 -1.400000 0.163895\n", 2016 "2013-01-03 1.866497 -2.092680 -2.173990 -0.911998\n", 2017 "2013-01-04 0.681486 -0.103983 -2.944416 -1.384497\n", 2018 "2013-01-05 0.321852 0.234193 -2.838630 -1.025391\n", 2019 "2013-01-06 -0.234028 1.349237 -4.946756 -0.885495" 2020 ] 2021 }, 2022 "execution_count": 37, 2023 "metadata": {}, 2024 "output_type": "execute_result" 2025 } 2026 ], 2027 "source": [ 2028 "df2.apply(np.cumsum)" 2029 ] 2030 }, 2031 { 2032 "cell_type": "code", 2033 "execution_count": 40, 2034 "metadata": { 2035 "slideshow": { 2036 "slide_type": "slide" 2037 } 2038 }, 2039 "outputs": [ 2040 { 2041 "data": { 2042 "text/html": [ 2043 "<div>\n", 2044 "<style scoped>\n", 2045 " .dataframe tbody tr th:only-of-type {\n", 2046 " vertical-align: middle;\n", 2047 " }\n", 2048 "\n", 2049 " .dataframe tbody tr th {\n", 2050 " vertical-align: top;\n", 2051 " }\n", 2052 "\n", 2053 " .dataframe thead th {\n", 2054 " text-align: right;\n", 2055 " }\n", 2056 "</style>\n", 2057 "<table border=\"1\" class=\"dataframe\">\n", 2058 " <thead>\n", 2059 " <tr style=\"text-align: right;\">\n", 2060 " <th></th>\n", 2061 " <th>A</th>\n", 2062 " <th>B</th>\n", 2063 " <th>C</th>\n", 2064 " <th>D</th>\n", 2065 " <th>E</th>\n", 2066 " </tr>\n", 2067 " </thead>\n", 2068 " <tbody>\n", 2069 " <tr>\n", 2070 " <th>2013-01-01</th>\n", 2071 " <td>-0.679399</td>\n", 2072 " <td>-0.564244</td>\n", 2073 " <td>-0.395166</td>\n", 2074 " <td>-0.004622</td>\n", 2075 " <td>0.674777</td>\n", 2076 " </tr>\n", 2077 " <tr>\n", 2078 " <th>2013-01-02</th>\n", 2079 " <td>2.147829</td>\n", 2080 " <td>-0.991826</td>\n", 2081 " <td>-1.004833</td>\n", 2082 " <td>0.168517</td>\n", 2083 " <td>3.152662</td>\n", 2084 " </tr>\n", 2085 " <tr>\n", 2086 " <th>2013-01-03</th>\n", 2087 " <td>0.398068</td>\n", 2088 " <td>-0.536610</td>\n", 2089 " <td>-0.773990</td>\n", 2090 " <td>-1.075894</td>\n", 2091 " <td>1.473961</td>\n", 2092 " </tr>\n", 2093 " <tr>\n", 2094 " <th>2013-01-04</th>\n", 2095 " <td>-1.185011</td>\n", 2096 " <td>1.988697</td>\n", 2097 " <td>-0.770427</td>\n", 2098 " <td>-0.472499</td>\n", 2099 " <td>3.173708</td>\n", 2100 " </tr>\n", 2101 " <tr>\n", 2102 " <th>2013-01-05</th>\n", 2103 " <td>-0.359634</td>\n", 2104 " <td>0.338176</td>\n", 2105 " <td>0.105786</td>\n", 2106 " <td>0.359107</td>\n", 2107 " <td>0.718741</td>\n", 2108 " </tr>\n", 2109 " <tr>\n", 2110 " <th>2013-01-06</th>\n", 2111 " <td>-0.555880</td>\n", 2112 " <td>1.115044</td>\n", 2113 " <td>-2.108126</td>\n", 2114 " <td>0.139896</td>\n", 2115 " <td>3.223170</td>\n", 2116 " </tr>\n", 2117 " </tbody>\n", 2118 "</table>\n", 2119 "</div>" 2120 ], 2121 "text/plain": [ 2122 " A B C D E\n", 2123 "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622 0.674777\n", 2124 "2013-01-02 2.147829 -0.991826 -1.004833 0.168517 3.152662\n", 2125 "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894 1.473961\n", 2126 "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499 3.173708\n", 2127 "2013-01-05 -0.359634 0.338176 0.105786 0.359107 0.718741\n", 2128 "2013-01-06 -0.555880 1.115044 -2.108126 0.139896 3.223170" 2129 ] 2130 }, 2131 "execution_count": 40, 2132 "metadata": {}, 2133 "output_type": "execute_result" 2134 } 2135 ], 2136 "source": [ 2137 "c = df2.apply(lambda x: x.max() - x.min(), axis=1)\n", 2138 "df2['E'] = c\n", 2139 "df2" 2140 ] 2141 }, 2142 { 2143 "cell_type": "code", 2144 "execution_count": 41, 2145 "metadata": { 2146 "slideshow": { 2147 "slide_type": "fragment" 2148 } 2149 }, 2150 "outputs": [ 2151 { 2152 "data": { 2153 "text/plain": [ 2154 "2013-01-01 1.354177\n", 2155 "2013-01-02 4.157495\n", 2156 "2013-01-03 2.549855\n", 2157 "2013-01-04 4.358719\n", 2158 "2013-01-05 1.078375\n", 2159 "2013-01-06 5.331296\n", 2160 "Freq: D, dtype: float64" 2161 ] 2162 }, 2163 "execution_count": 41, 2164 "metadata": {}, 2165 "output_type": "execute_result" 2166 } 2167 ], 2168 "source": [ 2169 "df2.apply(lambda x: x.max() - x.min(), axis=1)" 2170 ] 2171 }, 2172 { 2173 "cell_type": "markdown", 2174 "metadata": { 2175 "slideshow": { 2176 "slide_type": "slide" 2177 } 2178 }, 2179 "source": [ 2180 "### Uniones\n", 2181 "\n", 2182 "La librería `pandas` proporciona diferentes métodos para la unión de Series o DataFrame." 2183 ] 2184 }, 2185 { 2186 "cell_type": "markdown", 2187 "metadata": { 2188 "slideshow": { 2189 "slide_type": "slide" 2190 } 2191 }, 2192 "source": [ 2193 "#### Concat" 2194 ] 2195 }, 2196 { 2197 "cell_type": "code", 2198 "execution_count": 42, 2199 "metadata": { 2200 "slideshow": { 2201 "slide_type": "slide" 2202 } 2203 }, 2204 "outputs": [ 2205 { 2206 "data": { 2207 "text/html": [ 2208 "<div>\n", 2209 "<style scoped>\n", 2210 " .dataframe tbody tr th:only-of-type {\n", 2211 " vertical-align: middle;\n", 2212 " }\n", 2213 "\n", 2214 " .dataframe tbody tr th {\n", 2215 " vertical-align: top;\n", 2216 " }\n", 2217 "\n", 2218 " .dataframe thead th {\n", 2219 " text-align: right;\n", 2220 " }\n", 2221 "</style>\n", 2222 "<table border=\"1\" class=\"dataframe\">\n", 2223 " <thead>\n", 2224 " <tr style=\"text-align: right;\">\n", 2225 " <th></th>\n", 2226 " <th>0</th>\n", 2227 " <th>1</th>\n", 2228 " <th>2</th>\n", 2229 " <th>3</th>\n", 2230 " </tr>\n", 2231 " </thead>\n", 2232 " <tbody>\n", 2233 " <tr>\n", 2234 " <th>0</th>\n", 2235 " <td>-0.634450</td>\n", 2236 " <td>0.763724</td>\n", 2237 " <td>0.710228</td>\n", 2238 " <td>-0.694768</td>\n", 2239 " </tr>\n", 2240 " <tr>\n", 2241 " <th>1</th>\n", 2242 " <td>-0.142616</td>\n", 2243 " <td>1.630704</td>\n", 2244 " <td>1.029687</td>\n", 2245 " <td>-1.008484</td>\n", 2246 " </tr>\n", 2247 " <tr>\n", 2248 " <th>2</th>\n", 2249 " <td>-0.344466</td>\n", 2250 " <td>-0.222917</td>\n", 2251 " <td>0.294177</td>\n", 2252 " <td>-0.859483</td>\n", 2253 " </tr>\n", 2254 " <tr>\n", 2255 " <th>3</th>\n", 2256 " <td>1.012883</td>\n", 2257 " <td>-0.369916</td>\n", 2258 " <td>-0.552784</td>\n", 2259 " <td>1.356238</td>\n", 2260 " </tr>\n", 2261 " <tr>\n", 2262 " <th>4</th>\n", 2263 " <td>-0.167002</td>\n", 2264 " <td>1.677076</td>\n", 2265 " <td>-0.454767</td>\n", 2266 " <td>1.183958</td>\n", 2267 " </tr>\n", 2268 " <tr>\n", 2269 " <th>5</th>\n", 2270 " <td>-0.528190</td>\n", 2271 " <td>-0.912389</td>\n", 2272 " <td>0.786753</td>\n", 2273 " <td>1.043857</td>\n", 2274 " </tr>\n", 2275 " <tr>\n", 2276 " <th>6</th>\n", 2277 " <td>0.527898</td>\n", 2278 " <td>-0.379471</td>\n", 2279 " <td>1.537252</td>\n", 2280 " <td>-1.050597</td>\n", 2281 " </tr>\n", 2282 " <tr>\n", 2283 " <th>7</th>\n", 2284 " <td>-0.352473</td>\n", 2285 " <td>-1.825571</td>\n", 2286 " <td>0.186576</td>\n", 2287 " <td>0.977988</td>\n", 2288 " </tr>\n", 2289 " <tr>\n", 2290 " <th>8</th>\n", 2291 " <td>0.991172</td>\n", 2292 " <td>-0.030169</td>\n", 2293 " <td>-1.816031</td>\n", 2294 " <td>0.601092</td>\n", 2295 " </tr>\n", 2296 " <tr>\n", 2297 " <th>9</th>\n", 2298 " <td>1.522968</td>\n", 2299 " <td>0.440188</td>\n", 2300 " <td>-1.763289</td>\n", 2301 " <td>1.840091</td>\n", 2302 " </tr>\n", 2303 " </tbody>\n", 2304 "</table>\n", 2305 "</div>" 2306 ], 2307 "text/plain": [ 2308 " 0 1 2 3\n", 2309 "0 -0.634450 0.763724 0.710228 -0.694768\n", 2310 "1 -0.142616 1.630704 1.029687 -1.008484\n", 2311 "2 -0.344466 -0.222917 0.294177 -0.859483\n", 2312 "3 1.012883 -0.369916 -0.552784 1.356238\n", 2313 "4 -0.167002 1.677076 -0.454767 1.183958\n", 2314 "5 -0.528190 -0.912389 0.786753 1.043857\n", 2315 "6 0.527898 -0.379471 1.537252 -1.050597\n", 2316 "7 -0.352473 -1.825571 0.186576 0.977988\n", 2317 "8 0.991172 -0.030169 -1.816031 0.601092\n", 2318 "9 1.522968 0.440188 -1.763289 1.840091" 2319 ] 2320 }, 2321 "execution_count": 42, 2322 "metadata": {}, 2323 "output_type": "execute_result" 2324 } 2325 ], 2326 "source": [ 2327 "df = pd.DataFrame(np.random.randn(10, 4))\n", 2328 "df" 2329 ] 2330 }, 2331 { 2332 "cell_type": "code", 2333 "execution_count": 44, 2334 "metadata": { 2335 "slideshow": { 2336 "slide_type": "slide" 2337 } 2338 }, 2339 "outputs": [ 2340 { 2341 "data": { 2342 "text/plain": [ 2343 "[ 0 1 2 3\n", 2344 " 0 -0.634450 0.763724 0.710228 -0.694768\n", 2345 " 1 -0.142616 1.630704 1.029687 -1.008484\n", 2346 " 2 -0.344466 -0.222917 0.294177 -0.859483,\n", 2347 " 0 1 2 3\n", 2348 " 3 1.012883 -0.369916 -0.552784 1.356238\n", 2349 " 4 -0.167002 1.677076 -0.454767 1.183958\n", 2350 " 5 -0.528190 -0.912389 0.786753 1.043857\n", 2351 " 6 0.527898 -0.379471 1.537252 -1.050597,\n", 2352 " 0 1 2 3\n", 2353 " 7 -0.352473 -1.825571 0.186576 0.977988\n", 2354 " 8 0.991172 -0.030169 -1.816031 0.601092\n", 2355 " 9 1.522968 0.440188 -1.763289 1.840091]" 2356 ] 2357 }, 2358 "execution_count": 44, 2359 "metadata": {}, 2360 "output_type": "execute_result" 2361 } 2362 ], 2363 "source": [ 2364 "pieces = [df[:3], df[3:7], df[7:]]\n", 2365 "pieces" 2366 ] 2367 }, 2368 { 2369 "cell_type": "code", 2370 "execution_count": 45, 2371 "metadata": { 2372 "slideshow": { 2373 "slide_type": "slide" 2374 } 2375 }, 2376 "outputs": [ 2377 { 2378 "data": { 2379 "text/html": [ 2380 "<div>\n", 2381 "<style scoped>\n", 2382 " .dataframe tbody tr th:only-of-type {\n", 2383 " vertical-align: middle;\n", 2384 " }\n", 2385 "\n", 2386 " .dataframe tbody tr th {\n", 2387 " vertical-align: top;\n", 2388 " }\n", 2389 "\n", 2390 " .dataframe thead th {\n", 2391 " text-align: right;\n", 2392 " }\n", 2393 "</style>\n", 2394 "<table border=\"1\" class=\"dataframe\">\n", 2395 " <thead>\n", 2396 " <tr style=\"text-align: right;\">\n", 2397 " <th></th>\n", 2398 " <th>0</th>\n", 2399 " <th>1</th>\n", 2400 " <th>2</th>\n", 2401 " <th>3</th>\n", 2402 " </tr>\n", 2403 " </thead>\n", 2404 " <tbody>\n", 2405 " <tr>\n", 2406 " <th>0</th>\n", 2407 " <td>-0.634450</td>\n", 2408 " <td>0.763724</td>\n", 2409 " <td>0.710228</td>\n", 2410 " <td>-0.694768</td>\n", 2411 " </tr>\n", 2412 " <tr>\n", 2413 " <th>1</th>\n", 2414 " <td>-0.142616</td>\n", 2415 " <td>1.630704</td>\n", 2416 " <td>1.029687</td>\n", 2417 " <td>-1.008484</td>\n", 2418 " </tr>\n", 2419 " <tr>\n", 2420 " <th>2</th>\n", 2421 " <td>-0.344466</td>\n", 2422 " <td>-0.222917</td>\n", 2423 " <td>0.294177</td>\n", 2424 " <td>-0.859483</td>\n", 2425 " </tr>\n", 2426 " <tr>\n", 2427 " <th>3</th>\n", 2428 " <td>1.012883</td>\n", 2429 " <td>-0.369916</td>\n", 2430 " <td>-0.552784</td>\n", 2431 " <td>1.356238</td>\n", 2432 " </tr>\n", 2433 " <tr>\n", 2434 " <th>4</th>\n", 2435 " <td>-0.167002</td>\n", 2436 " <td>1.677076</td>\n", 2437 " <td>-0.454767</td>\n", 2438 " <td>1.183958</td>\n", 2439 " </tr>\n", 2440 " <tr>\n", 2441 " <th>5</th>\n", 2442 " <td>-0.528190</td>\n", 2443 " <td>-0.912389</td>\n", 2444 " <td>0.786753</td>\n", 2445 " <td>1.043857</td>\n", 2446 " </tr>\n", 2447 " <tr>\n", 2448 " <th>6</th>\n", 2449 " <td>0.527898</td>\n", 2450 " <td>-0.379471</td>\n", 2451 " <td>1.537252</td>\n", 2452 " <td>-1.050597</td>\n", 2453 " </tr>\n", 2454 " <tr>\n", 2455 " <th>7</th>\n", 2456 " <td>-0.352473</td>\n", 2457 " <td>-1.825571</td>\n", 2458 " <td>0.186576</td>\n", 2459 " <td>0.977988</td>\n", 2460 " </tr>\n", 2461 " <tr>\n", 2462 " <th>8</th>\n", 2463 " <td>0.991172</td>\n", 2464 " <td>-0.030169</td>\n", 2465 " <td>-1.816031</td>\n", 2466 " <td>0.601092</td>\n", 2467 " </tr>\n", 2468 " <tr>\n", 2469 " <th>9</th>\n", 2470 " <td>1.522968</td>\n", 2471 " <td>0.440188</td>\n", 2472 " <td>-1.763289</td>\n", 2473 " <td>1.840091</td>\n", 2474 " </tr>\n", 2475 " </tbody>\n", 2476 "</table>\n", 2477 "</div>" 2478 ], 2479 "text/plain": [ 2480 " 0 1 2 3\n", 2481 "0 -0.634450 0.763724 0.710228 -0.694768\n", 2482 "1 -0.142616 1.630704 1.029687 -1.008484\n", 2483 "2 -0.344466 -0.222917 0.294177 -0.859483\n", 2484 "3 1.012883 -0.369916 -0.552784 1.356238\n", 2485 "4 -0.167002 1.677076 -0.454767 1.183958\n", 2486 "5 -0.528190 -0.912389 0.786753 1.043857\n", 2487 "6 0.527898 -0.379471 1.537252 -1.050597\n", 2488 "7 -0.352473 -1.825571 0.186576 0.977988\n", 2489 "8 0.991172 -0.030169 -1.816031 0.601092\n", 2490 "9 1.522968 0.440188 -1.763289 1.840091" 2491 ] 2492 }, 2493 "execution_count": 45, 2494 "metadata": {}, 2495 "output_type": "execute_result" 2496 } 2497 ], 2498 "source": [ 2499 "pd.concat(pieces)" 2500 ] 2501 }, 2502 { 2503 "cell_type": "markdown", 2504 "metadata": { 2505 "slideshow": { 2506 "slide_type": "slide" 2507 } 2508 }, 2509 "source": [ 2510 "#### Join" 2511 ] 2512 }, 2513 { 2514 "cell_type": "code", 2515 "execution_count": 46, 2516 "metadata": { 2517 "slideshow": { 2518 "slide_type": "slide" 2519 } 2520 }, 2521 "outputs": [ 2522 { 2523 "data": { 2524 "text/html": [ 2525 "<div>\n", 2526 "<style scoped>\n", 2527 " .dataframe tbody tr th:only-of-type {\n", 2528 " vertical-align: middle;\n", 2529 " }\n", 2530 "\n", 2531 " .dataframe tbody tr th {\n", 2532 " vertical-align: top;\n", 2533 " }\n", 2534 "\n", 2535 " .dataframe thead th {\n", 2536 " text-align: right;\n", 2537 " }\n", 2538 "</style>\n", 2539 "<table border=\"1\" class=\"dataframe\">\n", 2540 " <thead>\n", 2541 " <tr style=\"text-align: right;\">\n", 2542 " <th></th>\n", 2543 " <th>key</th>\n", 2544 " <th>lval</th>\n", 2545 " </tr>\n", 2546 " </thead>\n", 2547 " <tbody>\n", 2548 " <tr>\n", 2549 " <th>0</th>\n", 2550 " <td>foo</td>\n", 2551 " <td>1</td>\n", 2552 " </tr>\n", 2553 " <tr>\n", 2554 " <th>1</th>\n", 2555 " <td>foo</td>\n", 2556 " <td>2</td>\n", 2557 " </tr>\n", 2558 " </tbody>\n", 2559 "</table>\n", 2560 "</div>" 2561 ], 2562 "text/plain": [ 2563 " key lval\n", 2564 "0 foo 1\n", 2565 "1 foo 2" 2566 ] 2567 }, 2568 "execution_count": 46, 2569 "metadata": {}, 2570 "output_type": "execute_result" 2571 } 2572 ], 2573 "source": [ 2574 "left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})\n", 2575 "left" 2576 ] 2577 }, 2578 { 2579 "cell_type": "code", 2580 "execution_count": 48, 2581 "metadata": { 2582 "slideshow": { 2583 "slide_type": "slide" 2584 } 2585 }, 2586 "outputs": [ 2587 { 2588 "data": { 2589 "text/html": [ 2590 "<div>\n", 2591 "<style scoped>\n", 2592 " .dataframe tbody tr th:only-of-type {\n", 2593 " vertical-align: middle;\n", 2594 " }\n", 2595 "\n", 2596 " .dataframe tbody tr th {\n", 2597 " vertical-align: top;\n", 2598 " }\n", 2599 "\n", 2600 " .dataframe thead th {\n", 2601 " text-align: right;\n", 2602 " }\n", 2603 "</style>\n", 2604 "<table border=\"1\" class=\"dataframe\">\n", 2605 " <thead>\n", 2606 " <tr style=\"text-align: right;\">\n", 2607 " <th></th>\n", 2608 " <th>key</th>\n", 2609 " <th>rval</th>\n", 2610 " </tr>\n", 2611 " </thead>\n", 2612 " <tbody>\n", 2613 " <tr>\n", 2614 " <th>0</th>\n", 2615 " <td>foo</td>\n", 2616 " <td>4</td>\n", 2617 " </tr>\n", 2618 " <tr>\n", 2619 " <th>1</th>\n", 2620 " <td>foo</td>\n", 2621 " <td>5</td>\n", 2622 " </tr>\n", 2623 " </tbody>\n", 2624 "</table>\n", 2625 "</div>" 2626 ], 2627 "text/plain": [ 2628 " key rval\n", 2629 "0 foo 4\n", 2630 "1 foo 5" 2631 ] 2632 }, 2633 "execution_count": 48, 2634 "metadata": {}, 2635 "output_type": "execute_result" 2636 } 2637 ], 2638 "source": [ 2639 "right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})\n", 2640 "right" 2641 ] 2642 }, 2643 { 2644 "cell_type": "code", 2645 "execution_count": 49, 2646 "metadata": { 2647 "slideshow": { 2648 "slide_type": "slide" 2649 } 2650 }, 2651 "outputs": [ 2652 { 2653 "data": { 2654 "text/html": [ 2655 "<div>\n", 2656 "<style scoped>\n", 2657 " .dataframe tbody tr th:only-of-type {\n", 2658 " vertical-align: middle;\n", 2659 " }\n", 2660 "\n", 2661 " .dataframe tbody tr th {\n", 2662 " vertical-align: top;\n", 2663 " }\n", 2664 "\n", 2665 " .dataframe thead th {\n", 2666 " text-align: right;\n", 2667 " }\n", 2668 "</style>\n", 2669 "<table border=\"1\" class=\"dataframe\">\n", 2670 " <thead>\n", 2671 " <tr style=\"text-align: right;\">\n", 2672 " <th></th>\n", 2673 " <th>key</th>\n", 2674 " <th>lval</th>\n", 2675 " <th>rval</th>\n", 2676 " </tr>\n", 2677 " </thead>\n", 2678 " <tbody>\n", 2679 " <tr>\n", 2680 " <th>0</th>\n", 2681 " <td>foo</td>\n", 2682 " <td>1</td>\n", 2683 " <td>4</td>\n", 2684 " </tr>\n", 2685 " <tr>\n", 2686 " <th>1</th>\n", 2687 " <td>foo</td>\n", 2688 " <td>1</td>\n", 2689 " <td>5</td>\n", 2690 " </tr>\n", 2691 " <tr>\n", 2692 " <th>2</th>\n", 2693 " <td>foo</td>\n", 2694 " <td>2</td>\n", 2695 " <td>4</td>\n", 2696 " </tr>\n", 2697 " <tr>\n", 2698 " <th>3</th>\n", 2699 " <td>foo</td>\n", 2700 " <td>2</td>\n", 2701 " <td>5</td>\n", 2702 " </tr>\n", 2703 " </tbody>\n", 2704 "</table>\n", 2705 "</div>" 2706 ], 2707 "text/plain": [ 2708 " key lval rval\n", 2709 "0 foo 1 4\n", 2710 "1 foo 1 5\n", 2711 "2 foo 2 4\n", 2712 "3 foo 2 5" 2713 ] 2714 }, 2715 "execution_count": 49, 2716 "metadata": {}, 2717 "output_type": "execute_result" 2718 } 2719 ], 2720 "source": [ 2721 "pd.merge(left, right, on='key')" 2722 ] 2723 }, 2724 { 2725 "cell_type": "markdown", 2726 "metadata": { 2727 "slideshow": { 2728 "slide_type": "slide" 2729 } 2730 }, 2731 "source": [ 2732 "#### Append" 2733 ] 2734 }, 2735 { 2736 "cell_type": "code", 2737 "execution_count": 50, 2738 "metadata": { 2739 "slideshow": { 2740 "slide_type": "slide" 2741 } 2742 }, 2743 "outputs": [ 2744 { 2745 "data": { 2746 "text/html": [ 2747 "<div>\n", 2748 "<style scoped>\n", 2749 " .dataframe tbody tr th:only-of-type {\n", 2750 " vertical-align: middle;\n", 2751 " }\n", 2752 "\n", 2753 " .dataframe tbody tr th {\n", 2754 " vertical-align: top;\n", 2755 " }\n", 2756 "\n", 2757 " .dataframe thead th {\n", 2758 " text-align: right;\n", 2759 " }\n", 2760 "</style>\n", 2761 "<table border=\"1\" class=\"dataframe\">\n", 2762 " <thead>\n", 2763 " <tr style=\"text-align: right;\">\n", 2764 " <th></th>\n", 2765 " <th>A</th>\n", 2766 " <th>B</th>\n", 2767 " <th>C</th>\n", 2768 " <th>D</th>\n", 2769 " </tr>\n", 2770 " </thead>\n", 2771 " <tbody>\n", 2772 " <tr>\n", 2773 " <th>0</th>\n", 2774 " <td>-0.362774</td>\n", 2775 " <td>-0.573908</td>\n", 2776 " <td>0.098044</td>\n", 2777 " <td>1.992482</td>\n", 2778 " </tr>\n", 2779 " <tr>\n", 2780 " <th>1</th>\n", 2781 " <td>1.437667</td>\n", 2782 " <td>0.940580</td>\n", 2783 " <td>-0.355047</td>\n", 2784 " <td>-0.142454</td>\n", 2785 " </tr>\n", 2786 " <tr>\n", 2787 " <th>2</th>\n", 2788 " <td>-1.097556</td>\n", 2789 " <td>-0.593504</td>\n", 2790 " <td>-1.313146</td>\n", 2791 " <td>-0.490131</td>\n", 2792 " </tr>\n", 2793 " <tr>\n", 2794 " <th>3</th>\n", 2795 " <td>1.028989</td>\n", 2796 " <td>0.098031</td>\n", 2797 " <td>0.881277</td>\n", 2798 " <td>0.426499</td>\n", 2799 " </tr>\n", 2800 " <tr>\n", 2801 " <th>4</th>\n", 2802 " <td>-0.589829</td>\n", 2803 " <td>-0.331404</td>\n", 2804 " <td>0.692164</td>\n", 2805 " <td>0.456827</td>\n", 2806 " </tr>\n", 2807 " <tr>\n", 2808 " <th>5</th>\n", 2809 " <td>-0.158751</td>\n", 2810 " <td>-0.199149</td>\n", 2811 " <td>-0.395195</td>\n", 2812 " <td>0.882798</td>\n", 2813 " </tr>\n", 2814 " <tr>\n", 2815 " <th>6</th>\n", 2816 " <td>-0.021648</td>\n", 2817 " <td>0.764384</td>\n", 2818 " <td>0.408657</td>\n", 2819 " <td>-1.262260</td>\n", 2820 " </tr>\n", 2821 " <tr>\n", 2822 " <th>7</th>\n", 2823 " <td>-1.113406</td>\n", 2824 " <td>0.107256</td>\n", 2825 " <td>0.420511</td>\n", 2826 " <td>-0.968303</td>\n", 2827 " </tr>\n", 2828 " </tbody>\n", 2829 "</table>\n", 2830 "</div>" 2831 ], 2832 "text/plain": [ 2833 " A B C D\n", 2834 "0 -0.362774 -0.573908 0.098044 1.992482\n", 2835 "1 1.437667 0.940580 -0.355047 -0.142454\n", 2836 "2 -1.097556 -0.593504 -1.313146 -0.490131\n", 2837 "3 1.028989 0.098031 0.881277 0.426499\n", 2838 "4 -0.589829 -0.331404 0.692164 0.456827\n", 2839 "5 -0.158751 -0.199149 -0.395195 0.882798\n", 2840 "6 -0.021648 0.764384 0.408657 -1.262260\n", 2841 "7 -1.113406 0.107256 0.420511 -0.968303" 2842 ] 2843 }, 2844 "execution_count": 50, 2845 "metadata": {}, 2846 "output_type": "execute_result" 2847 } 2848 ], 2849 "source": [ 2850 "df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])\n", 2851 "df" 2852 ] 2853 }, 2854 { 2855 "cell_type": "code", 2856 "execution_count": 51, 2857 "metadata": { 2858 "slideshow": { 2859 "slide_type": "slide" 2860 } 2861 }, 2862 "outputs": [ 2863 { 2864 "data": { 2865 "text/plain": [ 2866 "A 1.028989\n", 2867 "B 0.098031\n", 2868 "C 0.881277\n", 2869 "D 0.426499\n", 2870 "Name: 3, dtype: float64" 2871 ] 2872 }, 2873 "execution_count": 51, 2874 "metadata": {}, 2875 "output_type": "execute_result" 2876 } 2877 ], 2878 "source": [ 2879 "s = df.iloc[3]\n", 2880 "s" 2881 ] 2882 }, 2883 { 2884 "cell_type": "code", 2885 "execution_count": 52, 2886 "metadata": { 2887 "slideshow": { 2888 "slide_type": "slide" 2889 } 2890 }, 2891 "outputs": [ 2892 { 2893 "data": { 2894 "text/html": [ 2895 "<div>\n", 2896 "<style scoped>\n", 2897 " .dataframe tbody tr th:only-of-type {\n", 2898 " vertical-align: middle;\n", 2899 " }\n", 2900 "\n", 2901 " .dataframe tbody tr th {\n", 2902 " vertical-align: top;\n", 2903 " }\n", 2904 "\n", 2905 " .dataframe thead th {\n", 2906 " text-align: right;\n", 2907 " }\n", 2908 "</style>\n", 2909 "<table border=\"1\" class=\"dataframe\">\n", 2910 " <thead>\n", 2911 " <tr style=\"text-align: right;\">\n", 2912 " <th></th>\n", 2913 " <th>A</th>\n", 2914 " <th>B</th>\n", 2915 " <th>C</th>\n", 2916 " <th>D</th>\n", 2917 " </tr>\n", 2918 " </thead>\n", 2919 " <tbody>\n", 2920 " <tr>\n", 2921 " <th>0</th>\n", 2922 " <td>-0.362774</td>\n", 2923 " <td>-0.573908</td>\n", 2924 " <td>0.098044</td>\n", 2925 " <td>1.992482</td>\n", 2926 " </tr>\n", 2927 " <tr>\n", 2928 " <th>1</th>\n", 2929 " <td>1.437667</td>\n", 2930 " <td>0.940580</td>\n", 2931 " <td>-0.355047</td>\n", 2932 " <td>-0.142454</td>\n", 2933 " </tr>\n", 2934 " <tr>\n", 2935 " <th>2</th>\n", 2936 " <td>-1.097556</td>\n", 2937 " <td>-0.593504</td>\n", 2938 " <td>-1.313146</td>\n", 2939 " <td>-0.490131</td>\n", 2940 " </tr>\n", 2941 " <tr>\n", 2942 " <th>3</th>\n", 2943 " <td>1.028989</td>\n", 2944 " <td>0.098031</td>\n", 2945 " <td>0.881277</td>\n", 2946 " <td>0.426499</td>\n", 2947 " </tr>\n", 2948 " <tr>\n", 2949 " <th>4</th>\n", 2950 " <td>-0.589829</td>\n", 2951 " <td>-0.331404</td>\n", 2952 " <td>0.692164</td>\n", 2953 " <td>0.456827</td>\n", 2954 " </tr>\n", 2955 " <tr>\n", 2956 " <th>5</th>\n", 2957 " <td>-0.158751</td>\n", 2958 " <td>-0.199149</td>\n", 2959 " <td>-0.395195</td>\n", 2960 " <td>0.882798</td>\n", 2961 " </tr>\n", 2962 " <tr>\n", 2963 " <th>6</th>\n", 2964 " <td>-0.021648</td>\n", 2965 " <td>0.764384</td>\n", 2966 " <td>0.408657</td>\n", 2967 " <td>-1.262260</td>\n", 2968 " </tr>\n", 2969 " <tr>\n", 2970 " <th>7</th>\n", 2971 " <td>-1.113406</td>\n", 2972 " <td>0.107256</td>\n", 2973 " <td>0.420511</td>\n", 2974 " <td>-0.968303</td>\n", 2975 " </tr>\n", 2976 " <tr>\n", 2977 " <th>8</th>\n", 2978 " <td>1.028989</td>\n", 2979 " <td>0.098031</td>\n", 2980 " <td>0.881277</td>\n", 2981 " <td>0.426499</td>\n", 2982 " </tr>\n", 2983 " </tbody>\n", 2984 "</table>\n", 2985 "</div>" 2986 ], 2987 "text/plain": [ 2988 " A B C D\n", 2989 "0 -0.362774 -0.573908 0.098044 1.992482\n", 2990 "1 1.437667 0.940580 -0.355047 -0.142454\n", 2991 "2 -1.097556 -0.593504 -1.313146 -0.490131\n", 2992 "3 1.028989 0.098031 0.881277 0.426499\n", 2993 "4 -0.589829 -0.331404 0.692164 0.456827\n", 2994 "5 -0.158751 -0.199149 -0.395195 0.882798\n", 2995 "6 -0.021648 0.764384 0.408657 -1.262260\n", 2996 "7 -1.113406 0.107256 0.420511 -0.968303\n", 2997 "8 1.028989 0.098031 0.881277 0.426499" 2998 ] 2999 }, 3000 "execution_count": 52, 3001 "metadata": {}, 3002 "output_type": "execute_result" 3003 } 3004 ], 3005 "source": [ 3006 "df.append(s, ignore_index=True)" 3007 ] 3008 }, 3009 { 3010 "cell_type": "markdown", 3011 "metadata": { 3012 "slideshow": { 3013 "slide_type": "slide" 3014 } 3015 }, 3016 "source": [ 3017 "### Agrupamientos\n", 3018 "\n", 3019 "Cuando hablamos de agrupar datos en `pandas` nos referimos a un proceso que inplica uno o más de los siguientes pasos:\n", 3020 "\n", 3021 "- Separar los datos en grupos basados en algún criterio\n", 3022 "- Aplicar una función para cada grupo de forma independiente\n", 3023 "- Combinar los resultados en una estructura de datos" 3024 ] 3025 }, 3026 { 3027 "cell_type": "code", 3028 "execution_count": 53, 3029 "metadata": { 3030 "slideshow": { 3031 "slide_type": "slide" 3032 } 3033 }, 3034 "outputs": [ 3035 { 3036 "data": { 3037 "text/html": [ 3038 "<div>\n", 3039 "<style scoped>\n", 3040 " .dataframe tbody tr th:only-of-type {\n", 3041 " vertical-align: middle;\n", 3042 " }\n", 3043 "\n", 3044 " .dataframe tbody tr th {\n", 3045 " vertical-align: top;\n", 3046 " }\n", 3047 "\n", 3048 " .dataframe thead th {\n", 3049 " text-align: right;\n", 3050 " }\n", 3051 "</style>\n", 3052 "<table border=\"1\" class=\"dataframe\">\n", 3053 " <thead>\n", 3054 " <tr style=\"text-align: right;\">\n", 3055 " <th></th>\n", 3056 " <th>A</th>\n", 3057 " <th>B</th>\n", 3058 " <th>C</th>\n", 3059 " <th>D</th>\n", 3060 " </tr>\n", 3061 " </thead>\n", 3062 " <tbody>\n", 3063 " <tr>\n", 3064 " <th>0</th>\n", 3065 " <td>foo</td>\n", 3066 " <td>one</td>\n", 3067 " <td>-1.976302</td>\n", 3068 " <td>-0.708903</td>\n", 3069 " </tr>\n", 3070 " <tr>\n", 3071 " <th>1</th>\n", 3072 " <td>bar</td>\n", 3073 " <td>one</td>\n", 3074 " <td>-1.709147</td>\n", 3075 " <td>-0.680945</td>\n", 3076 " </tr>\n", 3077 " <tr>\n", 3078 " <th>2</th>\n", 3079 " <td>foo</td>\n", 3080 " <td>two</td>\n", 3081 " <td>0.229683</td>\n", 3082 " <td>-0.613908</td>\n", 3083 " </tr>\n", 3084 " <tr>\n", 3085 " <th>3</th>\n", 3086 " <td>bar</td>\n", 3087 " <td>three</td>\n", 3088 " <td>0.917311</td>\n", 3089 " <td>-0.819363</td>\n", 3090 " </tr>\n", 3091 " <tr>\n", 3092 " <th>4</th>\n", 3093 " <td>foo</td>\n", 3094 " <td>two</td>\n", 3095 " <td>-1.245424</td>\n", 3096 " <td>-1.041576</td>\n", 3097 " </tr>\n", 3098 " <tr>\n", 3099 " <th>5</th>\n", 3100 " <td>bar</td>\n", 3101 " <td>two</td>\n", 3102 " <td>0.904258</td>\n", 3103 " <td>-1.698605</td>\n", 3104 " </tr>\n", 3105 " <tr>\n", 3106 " <th>6</th>\n", 3107 " <td>foo</td>\n", 3108 " <td>one</td>\n", 3109 " <td>-1.215414</td>\n", 3110 " <td>1.879422</td>\n", 3111 " </tr>\n", 3112 " <tr>\n", 3113 " <th>7</th>\n", 3114 " <td>foo</td>\n", 3115 " <td>three</td>\n", 3116 " <td>1.406019</td>\n", 3117 " <td>-0.603691</td>\n", 3118 " </tr>\n", 3119 " </tbody>\n", 3120 "</table>\n", 3121 "</div>" 3122 ], 3123 "text/plain": [ 3124 " A B C D\n", 3125 "0 foo one -1.976302 -0.708903\n", 3126 "1 bar one -1.709147 -0.680945\n", 3127 "2 foo two 0.229683 -0.613908\n", 3128 "3 bar three 0.917311 -0.819363\n", 3129 "4 foo two -1.245424 -1.041576\n", 3130 "5 bar two 0.904258 -1.698605\n", 3131 "6 foo one -1.215414 1.879422\n", 3132 "7 foo three 1.406019 -0.603691" 3133 ] 3134 }, 3135 "execution_count": 53, 3136 "metadata": {}, 3137 "output_type": "execute_result" 3138 } 3139 ], 3140 "source": [ 3141 "df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',\n", 3142 " 'foo', 'bar', 'foo', 'foo'],\n", 3143 " 'B': ['one', 'one', 'two', 'three',\n", 3144 " 'two', 'two', 'one', 'three'],\n", 3145 " 'C': np.random.randn(8),\n", 3146 " 'D': np.random.randn(8)})\n", 3147 "df" 3148 ] 3149 }, 3150 { 3151 "cell_type": "code", 3152 "execution_count": 56, 3153 "metadata": { 3154 "slideshow": { 3155 "slide_type": "slide" 3156 } 3157 }, 3158 "outputs": [ 3159 { 3160 "data": { 3161 "text/html": [ 3162 "<div>\n", 3163 "<style scoped>\n", 3164 " .dataframe tbody tr th:only-of-type {\n", 3165 " vertical-align: middle;\n", 3166 " }\n", 3167 "\n", 3168 " .dataframe tbody tr th {\n", 3169 " vertical-align: top;\n", 3170 " }\n", 3171 "\n", 3172 " .dataframe thead th {\n", 3173 " text-align: right;\n", 3174 " }\n", 3175 "</style>\n", 3176 "<table border=\"1\" class=\"dataframe\">\n", 3177 " <thead>\n", 3178 " <tr style=\"text-align: right;\">\n", 3179 " <th></th>\n", 3180 " <th>C</th>\n", 3181 " <th>D</th>\n", 3182 " </tr>\n", 3183 " <tr>\n", 3184 " <th>A</th>\n", 3185 " <th></th>\n", 3186 " <th></th>\n", 3187 " </tr>\n", 3188 " </thead>\n", 3189 " <tbody>\n", 3190 " <tr>\n", 3191 " <th>bar</th>\n", 3192 " <td>0.112423</td>\n", 3193 " <td>-3.198913</td>\n", 3194 " </tr>\n", 3195 " <tr>\n", 3196 " <th>foo</th>\n", 3197 " <td>-2.801438</td>\n", 3198 " <td>-1.088656</td>\n", 3199 " </tr>\n", 3200 " </tbody>\n", 3201 "</table>\n", 3202 "</div>" 3203 ], 3204 "text/plain": [ 3205 " C D\n", 3206 "A \n", 3207 "bar 0.112423 -3.198913\n", 3208 "foo -2.801438 -1.088656" 3209 ] 3210 }, 3211 "execution_count": 56, 3212 "metadata": {}, 3213 "output_type": "execute_result" 3214 } 3215 ], 3216 "source": [ 3217 "df.groupby('A').sum()" 3218 ] 3219 }, 3220 { 3221 "cell_type": "code", 3222 "execution_count": 57, 3223 "metadata": { 3224 "slideshow": { 3225 "slide_type": "slide" 3226 } 3227 }, 3228 "outputs": [ 3229 { 3230 "data": { 3231 "text/html": [ 3232 "<div>\n", 3233 "<style scoped>\n", 3234 " .dataframe tbody tr th:only-of-type {\n", 3235 " vertical-align: middle;\n", 3236 " }\n", 3237 "\n", 3238 " .dataframe tbody tr th {\n", 3239 " vertical-align: top;\n", 3240 " }\n", 3241 "\n", 3242 " .dataframe thead th {\n", 3243 " text-align: right;\n", 3244 " }\n", 3245 "</style>\n", 3246 "<table border=\"1\" class=\"dataframe\">\n", 3247 " <thead>\n", 3248 " <tr style=\"text-align: right;\">\n", 3249 " <th></th>\n", 3250 " <th></th>\n", 3251 " <th>C</th>\n", 3252 " <th>D</th>\n", 3253 " </tr>\n", 3254 " <tr>\n", 3255 " <th>A</th>\n", 3256 " <th>B</th>\n", 3257 " <th></th>\n", 3258 " <th></th>\n", 3259 " </tr>\n", 3260 " </thead>\n", 3261 " <tbody>\n", 3262 " <tr>\n", 3263 " <th rowspan=\"3\" valign=\"top\">bar</th>\n", 3264 " <th>one</th>\n", 3265 " <td>-1.709147</td>\n", 3266 " <td>-0.680945</td>\n", 3267 " </tr>\n", 3268 " <tr>\n", 3269 " <th>three</th>\n", 3270 " <td>0.917311</td>\n", 3271 " <td>-0.819363</td>\n", 3272 " </tr>\n", 3273 " <tr>\n", 3274 " <th>two</th>\n", 3275 " <td>0.904258</td>\n", 3276 " <td>-1.698605</td>\n", 3277 " </tr>\n", 3278 " <tr>\n", 3279 " <th rowspan=\"3\" valign=\"top\">foo</th>\n", 3280 " <th>one</th>\n", 3281 " <td>-3.191715</td>\n", 3282 " <td>1.170519</td>\n", 3283 " </tr>\n", 3284 " <tr>\n", 3285 " <th>three</th>\n", 3286 " <td>1.406019</td>\n", 3287 " <td>-0.603691</td>\n", 3288 " </tr>\n", 3289 " <tr>\n", 3290 " <th>two</th>\n", 3291 " <td>-1.015741</td>\n", 3292 " <td>-1.655484</td>\n", 3293 " </tr>\n", 3294 " </tbody>\n", 3295 "</table>\n", 3296 "</div>" 3297 ], 3298 "text/plain": [ 3299 " C D\n", 3300 "A B \n", 3301 "bar one -1.709147 -0.680945\n", 3302 " three 0.917311 -0.819363\n", 3303 " two 0.904258 -1.698605\n", 3304 "foo one -3.191715 1.170519\n", 3305 " three 1.406019 -0.603691\n", 3306 " two -1.015741 -1.655484" 3307 ] 3308 }, 3309 "execution_count": 57, 3310 "metadata": {}, 3311 "output_type": "execute_result" 3312 } 3313 ], 3314 "source": [ 3315 "df.groupby(['A', 'B']).sum()" 3316 ] 3317 } 3318 ], 3319 "metadata": { 3320 "celltoolbar": "Slideshow", 3321 "kernelspec": { 3322 "display_name": "Python 3", 3323 "language": "python", 3324 "name": "python3" 3325 }, 3326 "language_info": { 3327 "codemirror_mode": { 3328 "name": "ipython", 3329 "version": 3 3330 }, 3331 "file_extension": ".py", 3332 "mimetype": "text/x-python", 3333 "name": "python", 3334 "nbconvert_exporter": "python", 3335 "pygments_lexer": "ipython3", 3336 "version": "3.7.1" 3337 } 3338 }, 3339 "nbformat": 4, 3340 "nbformat_minor": 2 3341}