{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## 11. Librería Pandas" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "**pandas** es una librería *open source* que nos proporciona estructuras de datos y herramientas de análisis de datos potentes y fáciles de usar en Python." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se puede instalar en nuestro entorno virtual con el siguiente comando:\n", "\n", "```\n", "pipenv install pandas\n", "```" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Se utiliza el alias `pd` como estándar de facto par el uso de **pandas**." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Series\n", "\n", "Una serie representa una secuencia de datos unidimensional, y se crea pasándole a pandas una lista de datos." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 3.0\n", "2 5.0\n", "3 NaN\n", "4 6.0\n", "5 8.0\n", "dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = pd.Series([1,3,5,np.nan,6,8])\n", "s" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### DataFrame\n", "\n", "Un objeto `DataFrame` representa una estructura tabular bi-dimensional que contiene datos potencialmente heterogéneos, con filas etiquetadas.\n", "\n", "Se pueden crear a partir de un diccionario, o de un `array` de NumPy." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [], "source": [ "df = pd.DataFrame({\n", " 'A' : [1., 2., np.nan, None],\n", " 'B' : pd.Timestamp('20130102'),\n", " 'C' : pd.Series(1,index=list(range(4)),dtype='float32'),\n", " 'D' : np.array([3] * 4,dtype='int32'),\n", " 'E' : pd.Categorical([\"test\",\"train\",\"test\",\"train\"]),\n", " 'F' : 'foo'\n", "})" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDEF
01.02013-01-021.03testfoo
12.02013-01-021.03trainfoo
2NaN2013-01-021.03testfoo
3NaN2013-01-021.03trainfoo
\n", "
" ], "text/plain": [ " A B C D E F\n", "0 1.0 2013-01-02 1.0 3 test foo\n", "1 2.0 2013-01-02 1.0 3 train foo\n", "2 NaN 2013-01-02 1.0 3 test foo\n", "3 NaN 2013-01-02 1.0 3 train foo" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "A float64\n", "B datetime64[ns]\n", "C float32\n", "D int32\n", "E category\n", "F object\n", "dtype: object" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Podemos utilizar una serie para especificar la columna de índice." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n", " '2013-01-05', '2013-01-06'],\n", " dtype='datetime64[ns]', freq='D')" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dates = pd.date_range('20130101', periods=6)\n", "dates" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.679399-0.564244-0.395166-0.004622
2013-01-022.147829-0.991826-1.0048330.168517
2013-01-030.398068-0.536610-0.773990-1.075894
2013-01-04-1.1850111.988697-0.770427-0.472499
2013-01-05-0.3596340.3381760.1057860.359107
2013-01-06-0.5558801.115044-2.1081260.139896
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n", "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499\n", "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n", "2013-01-06 -0.555880 1.115044 -2.108126 0.139896" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2 = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD'))\n", "df2" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Ejes\n", "\n", "En un DataFrame de `pandas` se pueden realizar operaciones a lo largo de los dos ejes, o `axis`.\n", "\n", "- Si en una operación especificamos `axis=0` nos referimos a loas índices, es decir, estaremos diciendo que la operación se realiza para todas las filas.\n", "- Si en una operación especificamos `axis=1` estaremos diciendo que la operación se realiza para todas las columnas." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "![Axis](./img/axis.jpg)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Visualización de datos" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.679399-0.564244-0.395166-0.004622
2013-01-022.147829-0.991826-1.0048330.168517
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", "2013-01-02 2.147829 -0.991826 -1.004833 0.168517" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.head(2)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-05-0.3596340.3381760.1057860.359107
2013-01-06-0.5558801.115044-2.1081260.139896
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n", "2013-01-06 -0.555880 1.115044 -2.108126 0.139896" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.tail(2)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',\n", " '2013-01-05', '2013-01-06'],\n", " dtype='datetime64[ns]', freq='D')" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.index" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "Index(['A', 'B', 'C', 'D'], dtype='object')" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.columns" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### DataFrame.to_numpy()\n", "\n", "El método `.to_numpy()` de un DataFrame nos da una representación en una estructura de datos de `numpy` de los datos del DataFrame," ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "array([[-0.67939947, -0.5642441 , -0.39516608, -0.00462202],\n", " [ 2.14782856, -0.99182561, -1.00483345, 0.16851747],\n", " [ 0.39806756, -0.53661026, -0.77399033, -1.07589368],\n", " [-1.18501088, 1.98869725, -0.77042661, -0.47249893],\n", " [-0.35963418, 0.3381756 , 0.10578614, 0.35910665],\n", " [-0.55588001, 1.11504445, -2.10812582, 0.13989579]])" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.to_numpy()\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Describe\n", "\n", "El método `.describe()` nos muestra un resumen estadístico de los datos." ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
count6.0000006.0000006.0000006.000000
mean-0.0390050.224873-0.824459-0.147582
std1.1888371.1488460.7396550.534241
min-1.185011-0.991826-2.108126-1.075894
25%-0.648520-0.557336-0.947123-0.355530
50%-0.457757-0.099217-0.7722080.067637
75%0.2086420.920827-0.4889810.161362
max2.1478291.9886970.1057860.359107
\n", "
" ], "text/plain": [ " A B C D\n", "count 6.000000 6.000000 6.000000 6.000000\n", "mean -0.039005 0.224873 -0.824459 -0.147582\n", "std 1.188837 1.148846 0.739655 0.534241\n", "min -1.185011 -0.991826 -2.108126 -1.075894\n", "25% -0.648520 -0.557336 -0.947123 -0.355530\n", "50% -0.457757 -0.099217 -0.772208 0.067637\n", "75% 0.208642 0.920827 -0.488981 0.161362\n", "max 2.147829 1.988697 0.105786 0.359107" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.describe()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Transposición\n", "\n", "Podemos obtener el DataFrame transpuesto de uno dado a través del atributo `T`." ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
2013-01-012013-01-022013-01-032013-01-042013-01-052013-01-06
A-0.6793992.1478290.398068-1.185011-0.359634-0.555880
B-0.564244-0.991826-0.5366101.9886970.3381761.115044
C-0.395166-1.004833-0.773990-0.7704270.105786-2.108126
D-0.0046220.168517-1.075894-0.4724990.3591070.139896
\n", "
" ], "text/plain": [ " 2013-01-01 2013-01-02 2013-01-03 2013-01-04 2013-01-05 2013-01-06\n", "A -0.679399 2.147829 0.398068 -1.185011 -0.359634 -0.555880\n", "B -0.564244 -0.991826 -0.536610 1.988697 0.338176 1.115044\n", "C -0.395166 -1.004833 -0.773990 -0.770427 0.105786 -2.108126\n", "D -0.004622 0.168517 -1.075894 -0.472499 0.359107 0.139896" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.T" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Ordenación\n", "\n", "Podemos ordenar los datos por alguno de los ejes o por valores." ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DCBA
2013-01-01-0.004622-0.395166-0.564244-0.679399
2013-01-020.168517-1.004833-0.9918262.147829
2013-01-03-1.075894-0.773990-0.5366100.398068
2013-01-04-0.472499-0.7704271.988697-1.185011
2013-01-050.3591070.1057860.338176-0.359634
2013-01-060.139896-2.1081261.115044-0.555880
\n", "
" ], "text/plain": [ " D C B A\n", "2013-01-01 -0.004622 -0.395166 -0.564244 -0.679399\n", "2013-01-02 0.168517 -1.004833 -0.991826 2.147829\n", "2013-01-03 -1.075894 -0.773990 -0.536610 0.398068\n", "2013-01-04 -0.472499 -0.770427 1.988697 -1.185011\n", "2013-01-05 0.359107 0.105786 0.338176 -0.359634\n", "2013-01-06 0.139896 -2.108126 1.115044 -0.555880" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.sort_index(axis=1, ascending=False)" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-022.147829-0.991826-1.0048330.168517
2013-01-01-0.679399-0.564244-0.395166-0.004622
2013-01-030.398068-0.536610-0.773990-1.075894
2013-01-05-0.3596340.3381760.1057860.359107
2013-01-06-0.5558801.115044-2.1081260.139896
2013-01-04-1.1850111.988697-0.770427-0.472499
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n", "2013-01-05 -0.359634 0.338176 0.105786 0.359107\n", "2013-01-06 -0.555880 1.115044 -2.108126 0.139896\n", "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.sort_values(by='B')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Selección \n", "\n", "Podemos obtener una selección de los datos usando los métodos estándar de Python o `numpy` para la obtener *slices* en listas o matrices." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "Además, `pandas` proporcia métodos especializados (y optimizados) para el acceso a los datos:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`.loc`\n", "\n", "Se utiliza principalmente para acceder por etiqueta. Soporta los siguietnes tipos de entradas:\n", "\n", "- Una etiqueta única: df.loc['a']\n", "- Una lista o array de etiqueta: df.loc[['a', 'b', 'c']]\n", "- Un *slice* con etiquetas: df.loc[a':'f']" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "`.iloc`\n", "\n", "Se utiliza principalmente para acceder posición. Soporta los siguietnes tipos de entradas:\n", "\n", "- Una entero: df.iloc[0]\n", "- Una lista o array de enteros: df.iloc[[0, 1, 2]]\n", "- Un *slice* : df.loc[1:3]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "Tipo de objeto | Selección | Valor retornado\n", "---------------|----------------|-------------------------------------\n", "Series | series[label] | valor escalar\n", "DataFrame | frame[colname] | La serie correspondiente a `colname`" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "2013-01-01 -0.679399\n", "2013-01-02 2.147829\n", "2013-01-03 0.398068\n", "2013-01-04 -1.185011\n", "2013-01-05 -0.359634\n", "2013-01-06 -0.555880\n", "Freq: D, Name: A, dtype: float64" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2['A']" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2013-01-01 -0.679399\n", "2013-01-02 2.147829\n", "2013-01-03 0.398068\n", "2013-01-04 -1.185011\n", "2013-01-05 -0.359634\n", "2013-01-06 -0.555880\n", "Freq: D, Name: A, dtype: float64" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.A" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.679399-0.564244-0.395166-0.004622
2013-01-022.147829-0.991826-1.0048330.168517
2013-01-030.398068-0.536610-0.773990-1.075894
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2[0:3]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-022.147829-0.991826-1.0048330.168517
2013-01-030.398068-0.536610-0.773990-1.075894
2013-01-04-1.1850111.988697-0.770427-0.472499
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894\n", "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2['20130102':'20130104']" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "A -0.679399\n", "B -0.564244\n", "C -0.395166\n", "D -0.004622\n", "Name: 2013-01-01 00:00:00, dtype: float64" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.loc[dates[0]]" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2013-01-01-0.679399-0.564244
2013-01-022.147829-0.991826
2013-01-030.398068-0.536610
2013-01-04-1.1850111.988697
2013-01-05-0.3596340.338176
2013-01-06-0.5558801.115044
\n", "
" ], "text/plain": [ " A B\n", "2013-01-01 -0.679399 -0.564244\n", "2013-01-02 2.147829 -0.991826\n", "2013-01-03 0.398068 -0.536610\n", "2013-01-04 -1.185011 1.988697\n", "2013-01-05 -0.359634 0.338176\n", "2013-01-06 -0.555880 1.115044" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.loc[:, ['A', 'B']]" ] }, { "cell_type": "code", "execution_count": 29, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2013-01-022.147829-0.991826
2013-01-030.398068-0.536610
2013-01-04-1.1850111.988697
\n", "
" ], "text/plain": [ " A B\n", "2013-01-02 2.147829 -0.991826\n", "2013-01-03 0.398068 -0.536610\n", "2013-01-04 -1.185011 1.988697" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.loc['20130102':'20130104', ['A', 'B']]" ] }, { "cell_type": "code", "execution_count": 30, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "A 2.147829\n", "B -0.991826\n", "Name: 2013-01-02 00:00:00, dtype: float64" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.loc['20130102', ['A', 'B']]" ] }, { "cell_type": "code", "execution_count": 31, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "A -1.185011\n", "B 1.988697\n", "C -0.770427\n", "D -0.472499\n", "Name: 2013-01-04 00:00:00, dtype: float64" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.iloc[3]" ] }, { "cell_type": "code", "execution_count": 32, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AB
2013-01-04-1.1850111.988697
2013-01-05-0.3596340.338176
\n", "
" ], "text/plain": [ " A B\n", "2013-01-04 -1.185011 1.988697\n", "2013-01-05 -0.359634 0.338176" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.iloc[3:5, 0:2]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Indexación condicional\n", "\n", "Se puede acceder a las columnas que cumplan una condición concreta, indicando la condición en el selector." ] }, { "cell_type": "code", "execution_count": 33, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-022.147829-0.991826-1.0048330.168517
2013-01-030.398068-0.536610-0.773990-1.075894
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-02 2.147829 -0.991826 -1.004833 0.168517\n", "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2[df2.A > 0]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Operaciones" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se pueden realizar operaciones estadísticas básicas llamando a los métodos correspondientes." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "A -0.039005\n", "B 0.224873\n", "C -0.824459\n", "D -0.147582\n", "dtype: float64" ] }, "execution_count": 34, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.mean()" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "-0.03900473868952752" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2['A'].mean()" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "2013-01-01 -0.410858\n", "2013-01-02 0.079922\n", "2013-01-03 -0.497107\n", "2013-01-04 -0.109810\n", "2013-01-05 0.110859\n", "2013-01-06 -0.352266\n", "Freq: D, dtype: float64" ] }, "execution_count": 36, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.mean(axis=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Se pueden aplicar funciones a los datos." ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
2013-01-01-0.679399-0.564244-0.395166-0.004622
2013-01-021.468429-1.556070-1.4000000.163895
2013-01-031.866497-2.092680-2.173990-0.911998
2013-01-040.681486-0.103983-2.944416-1.384497
2013-01-050.3218520.234193-2.838630-1.025391
2013-01-06-0.2340281.349237-4.946756-0.885495
\n", "
" ], "text/plain": [ " A B C D\n", "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622\n", "2013-01-02 1.468429 -1.556070 -1.400000 0.163895\n", "2013-01-03 1.866497 -2.092680 -2.173990 -0.911998\n", "2013-01-04 0.681486 -0.103983 -2.944416 -1.384497\n", "2013-01-05 0.321852 0.234193 -2.838630 -1.025391\n", "2013-01-06 -0.234028 1.349237 -4.946756 -0.885495" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.apply(np.cumsum)" ] }, { "cell_type": "code", "execution_count": 40, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCDE
2013-01-01-0.679399-0.564244-0.395166-0.0046220.674777
2013-01-022.147829-0.991826-1.0048330.1685173.152662
2013-01-030.398068-0.536610-0.773990-1.0758941.473961
2013-01-04-1.1850111.988697-0.770427-0.4724993.173708
2013-01-05-0.3596340.3381760.1057860.3591070.718741
2013-01-06-0.5558801.115044-2.1081260.1398963.223170
\n", "
" ], "text/plain": [ " A B C D E\n", "2013-01-01 -0.679399 -0.564244 -0.395166 -0.004622 0.674777\n", "2013-01-02 2.147829 -0.991826 -1.004833 0.168517 3.152662\n", "2013-01-03 0.398068 -0.536610 -0.773990 -1.075894 1.473961\n", "2013-01-04 -1.185011 1.988697 -0.770427 -0.472499 3.173708\n", "2013-01-05 -0.359634 0.338176 0.105786 0.359107 0.718741\n", "2013-01-06 -0.555880 1.115044 -2.108126 0.139896 3.223170" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "c = df2.apply(lambda x: x.max() - x.min(), axis=1)\n", "df2['E'] = c\n", "df2" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "2013-01-01 1.354177\n", "2013-01-02 4.157495\n", "2013-01-03 2.549855\n", "2013-01-04 4.358719\n", "2013-01-05 1.078375\n", "2013-01-06 5.331296\n", "Freq: D, dtype: float64" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df2.apply(lambda x: x.max() - x.min(), axis=1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Uniones\n", "\n", "La librería `pandas` proporciona diferentes métodos para la unión de Series o DataFrame." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Concat" ] }, { "cell_type": "code", "execution_count": 42, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123
0-0.6344500.7637240.710228-0.694768
1-0.1426161.6307041.029687-1.008484
2-0.344466-0.2229170.294177-0.859483
31.012883-0.369916-0.5527841.356238
4-0.1670021.677076-0.4547671.183958
5-0.528190-0.9123890.7867531.043857
60.527898-0.3794711.537252-1.050597
7-0.352473-1.8255710.1865760.977988
80.991172-0.030169-1.8160310.601092
91.5229680.440188-1.7632891.840091
\n", "
" ], "text/plain": [ " 0 1 2 3\n", "0 -0.634450 0.763724 0.710228 -0.694768\n", "1 -0.142616 1.630704 1.029687 -1.008484\n", "2 -0.344466 -0.222917 0.294177 -0.859483\n", "3 1.012883 -0.369916 -0.552784 1.356238\n", "4 -0.167002 1.677076 -0.454767 1.183958\n", "5 -0.528190 -0.912389 0.786753 1.043857\n", "6 0.527898 -0.379471 1.537252 -1.050597\n", "7 -0.352473 -1.825571 0.186576 0.977988\n", "8 0.991172 -0.030169 -1.816031 0.601092\n", "9 1.522968 0.440188 -1.763289 1.840091" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(np.random.randn(10, 4))\n", "df" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "[ 0 1 2 3\n", " 0 -0.634450 0.763724 0.710228 -0.694768\n", " 1 -0.142616 1.630704 1.029687 -1.008484\n", " 2 -0.344466 -0.222917 0.294177 -0.859483,\n", " 0 1 2 3\n", " 3 1.012883 -0.369916 -0.552784 1.356238\n", " 4 -0.167002 1.677076 -0.454767 1.183958\n", " 5 -0.528190 -0.912389 0.786753 1.043857\n", " 6 0.527898 -0.379471 1.537252 -1.050597,\n", " 0 1 2 3\n", " 7 -0.352473 -1.825571 0.186576 0.977988\n", " 8 0.991172 -0.030169 -1.816031 0.601092\n", " 9 1.522968 0.440188 -1.763289 1.840091]" ] }, "execution_count": 44, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pieces = [df[:3], df[3:7], df[7:]]\n", "pieces" ] }, { "cell_type": "code", "execution_count": 45, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0123
0-0.6344500.7637240.710228-0.694768
1-0.1426161.6307041.029687-1.008484
2-0.344466-0.2229170.294177-0.859483
31.012883-0.369916-0.5527841.356238
4-0.1670021.677076-0.4547671.183958
5-0.528190-0.9123890.7867531.043857
60.527898-0.3794711.537252-1.050597
7-0.352473-1.8255710.1865760.977988
80.991172-0.030169-1.8160310.601092
91.5229680.440188-1.7632891.840091
\n", "
" ], "text/plain": [ " 0 1 2 3\n", "0 -0.634450 0.763724 0.710228 -0.694768\n", "1 -0.142616 1.630704 1.029687 -1.008484\n", "2 -0.344466 -0.222917 0.294177 -0.859483\n", "3 1.012883 -0.369916 -0.552784 1.356238\n", "4 -0.167002 1.677076 -0.454767 1.183958\n", "5 -0.528190 -0.912389 0.786753 1.043857\n", "6 0.527898 -0.379471 1.537252 -1.050597\n", "7 -0.352473 -1.825571 0.186576 0.977988\n", "8 0.991172 -0.030169 -1.816031 0.601092\n", "9 1.522968 0.440188 -1.763289 1.840091" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.concat(pieces)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Join" ] }, { "cell_type": "code", "execution_count": 46, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keylval
0foo1
1foo2
\n", "
" ], "text/plain": [ " key lval\n", "0 foo 1\n", "1 foo 2" ] }, "execution_count": 46, "metadata": {}, "output_type": "execute_result" } ], "source": [ "left = pd.DataFrame({'key': ['foo', 'foo'], 'lval': [1, 2]})\n", "left" ] }, { "cell_type": "code", "execution_count": 48, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keyrval
0foo4
1foo5
\n", "
" ], "text/plain": [ " key rval\n", "0 foo 4\n", "1 foo 5" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "right = pd.DataFrame({'key': ['foo', 'foo'], 'rval': [4, 5]})\n", "right" ] }, { "cell_type": "code", "execution_count": 49, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
keylvalrval
0foo14
1foo15
2foo24
3foo25
\n", "
" ], "text/plain": [ " key lval rval\n", "0 foo 1 4\n", "1 foo 1 5\n", "2 foo 2 4\n", "3 foo 2 5" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.merge(left, right, on='key')" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### Append" ] }, { "cell_type": "code", "execution_count": 50, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-0.362774-0.5739080.0980441.992482
11.4376670.940580-0.355047-0.142454
2-1.097556-0.593504-1.313146-0.490131
31.0289890.0980310.8812770.426499
4-0.589829-0.3314040.6921640.456827
5-0.158751-0.199149-0.3951950.882798
6-0.0216480.7643840.408657-1.262260
7-1.1134060.1072560.420511-0.968303
\n", "
" ], "text/plain": [ " A B C D\n", "0 -0.362774 -0.573908 0.098044 1.992482\n", "1 1.437667 0.940580 -0.355047 -0.142454\n", "2 -1.097556 -0.593504 -1.313146 -0.490131\n", "3 1.028989 0.098031 0.881277 0.426499\n", "4 -0.589829 -0.331404 0.692164 0.456827\n", "5 -0.158751 -0.199149 -0.395195 0.882798\n", "6 -0.021648 0.764384 0.408657 -1.262260\n", "7 -1.113406 0.107256 0.420511 -0.968303" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B', 'C', 'D'])\n", "df" ] }, { "cell_type": "code", "execution_count": 51, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/plain": [ "A 1.028989\n", "B 0.098031\n", "C 0.881277\n", "D 0.426499\n", "Name: 3, dtype: float64" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "s = df.iloc[3]\n", "s" ] }, { "cell_type": "code", "execution_count": 52, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0-0.362774-0.5739080.0980441.992482
11.4376670.940580-0.355047-0.142454
2-1.097556-0.593504-1.313146-0.490131
31.0289890.0980310.8812770.426499
4-0.589829-0.3314040.6921640.456827
5-0.158751-0.199149-0.3951950.882798
6-0.0216480.7643840.408657-1.262260
7-1.1134060.1072560.420511-0.968303
81.0289890.0980310.8812770.426499
\n", "
" ], "text/plain": [ " A B C D\n", "0 -0.362774 -0.573908 0.098044 1.992482\n", "1 1.437667 0.940580 -0.355047 -0.142454\n", "2 -1.097556 -0.593504 -1.313146 -0.490131\n", "3 1.028989 0.098031 0.881277 0.426499\n", "4 -0.589829 -0.331404 0.692164 0.456827\n", "5 -0.158751 -0.199149 -0.395195 0.882798\n", "6 -0.021648 0.764384 0.408657 -1.262260\n", "7 -1.113406 0.107256 0.420511 -0.968303\n", "8 1.028989 0.098031 0.881277 0.426499" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.append(s, ignore_index=True)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Agrupamientos\n", "\n", "Cuando hablamos de agrupar datos en `pandas` nos referimos a un proceso que inplica uno o más de los siguientes pasos:\n", "\n", "- Separar los datos en grupos basados en algún criterio\n", "- Aplicar una función para cada grupo de forma independiente\n", "- Combinar los resultados en una estructura de datos" ] }, { "cell_type": "code", "execution_count": 53, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ABCD
0fooone-1.976302-0.708903
1barone-1.709147-0.680945
2footwo0.229683-0.613908
3barthree0.917311-0.819363
4footwo-1.245424-1.041576
5bartwo0.904258-1.698605
6fooone-1.2154141.879422
7foothree1.406019-0.603691
\n", "
" ], "text/plain": [ " A B C D\n", "0 foo one -1.976302 -0.708903\n", "1 bar one -1.709147 -0.680945\n", "2 foo two 0.229683 -0.613908\n", "3 bar three 0.917311 -0.819363\n", "4 foo two -1.245424 -1.041576\n", "5 bar two 0.904258 -1.698605\n", "6 foo one -1.215414 1.879422\n", "7 foo three 1.406019 -0.603691" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar',\n", " 'foo', 'bar', 'foo', 'foo'],\n", " 'B': ['one', 'one', 'two', 'three',\n", " 'two', 'two', 'one', 'three'],\n", " 'C': np.random.randn(8),\n", " 'D': np.random.randn(8)})\n", "df" ] }, { "cell_type": "code", "execution_count": 56, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD
A
bar0.112423-3.198913
foo-2.801438-1.088656
\n", "
" ], "text/plain": [ " C D\n", "A \n", "bar 0.112423 -3.198913\n", "foo -2.801438 -1.088656" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby('A').sum()" ] }, { "cell_type": "code", "execution_count": 57, "metadata": { "slideshow": { "slide_type": "slide" } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CD
AB
barone-1.709147-0.680945
three0.917311-0.819363
two0.904258-1.698605
fooone-3.1917151.170519
three1.406019-0.603691
two-1.015741-1.655484
\n", "
" ], "text/plain": [ " C D\n", "A B \n", "bar one -1.709147 -0.680945\n", " three 0.917311 -0.819363\n", " two 0.904258 -1.698605\n", "foo one -3.191715 1.170519\n", " three 1.406019 -0.603691\n", " two -1.015741 -1.655484" ] }, "execution_count": 57, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.groupby(['A', 'B']).sum()" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 2 }