A monorepo containing jupyter-blocks and jupyter-tidyblocks. Blockly extension for JupyterLab.
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

update to include and harmonize all dplyr verbs

+618 -37
+1
.gitignore
··· 127 127 jupyterlab_blockly/_version.py 128 128 /.turbo 129 129 /jupyter_tidyblocks/labextension/static 130 + .yarn/install-state.gz
+30 -3
CHANGELOG.md
··· 9 9 10 10 ### New features 11 11 12 - - **`packages/tidyblocks`**: new monorepo package providing 50+ tidy-data analysis blocks organized into seven categories — Data, Transform, Combine, Plot, Stats, Value, and Op — with Python (pandas / plotly.express / scipy / sklearn) code generators 12 + - **`packages/tidyblocks`**: new monorepo package providing 60+ tidy-data analysis blocks organized into seven categories — Data, Transform, Combine, Plot, Stats, Values, and Operations — with Python (pandas / plotly.express / scipy / sklearn) code generators 13 13 - Exports `registerTidyblocks(registry)` for registering all blocks and the Tidy Data toolbox with any `IBlocklyRegistry` instance 14 + - Block names aligned with [dplyr (tidyverse)](https://dplyr.tidyverse.org/) conventions; 7 blocks renamed and 10 new blocks added (see below) 15 + 16 + #### dplyr alignment — renames 17 + 18 + | Old | New | dplyr verb | 19 + |---|---|---| 20 + | create column | mutate | `mutate()` | 21 + | sort by | arrange by | `arrange()` | 22 + | unique by | distinct by | `distinct()` | 23 + | first N rows | slice_head | `slice_head()` | 24 + | last N rows | slice_tail | `slice_tail()` | 25 + | sample N rows | slice_sample | `slice_sample()` | 26 + | glue with | bind_rows with | `bind_rows()` | 27 + 28 + #### dplyr alignment — new blocks 29 + 30 + - **Transform**: `count()`, `relocate()`, `slice_min()`, `slice_max()` 31 + - **Combine**: `semi_join()`, `anti_join()`, `bind_cols()` 32 + - **Operations**: `between()`, `coalesce()`, `n_distinct()` 33 + - **summarize** block: added `n distinct` aggregate function option 14 34 15 35 ### Rebrand & metadata 16 36 ··· 49 69 - `packages/tidyblocks/src/index.ts`: `registry.addToolbox` → `registry.registerToolbox` (correct method name on `IBlocklyRegistry`) 50 70 - Root `tsconfig.json`: added `"lib": ["ES2020", "DOM"]` to resolve `Intl.ResolvedRelativeTimeFormatOptions` error from `@jupyterlab/coreutils` 51 71 72 + ### Package manager 73 + 74 + - Migrated from Yarn 4 to npm; `yarn.lock` / `.yarnrc.yml` / `.yarn/` removed; `"resolutions"` → `"overrides"`; `jlpm` replaced with `npm run` in all scripts 75 + - `yarn.lock` added to `.gitignore` (regenerated as a build side-effect by `@jupyterlab/builder`'s bundled jlpm) 76 + 52 77 ### Docs 53 78 54 - - `docs/jupyterlab-blockly_architecture.md`: full architecture document (refers to the upstream project as `jupyterlab-blockly`) 55 - - `docs/tidyblocks-features.md`: feature inventory and port plan from gvwilson/tidyblocks 79 + - `docs/getting-started.md`: step-by-step guide for installing and testing the extension in JupyterLab 80 + - `docs/architecture.md`: full architecture reference (package layout, data-flow, extension points) 81 + - `docs/blocks-reference.md`: complete block reference with dplyr mapping, description, and generated Python for every block 82 + - `docs/work-summary.md`: narrative summary of all engineering work done in this release 56 83 - `docs/modernization-plan.md`: full modernization plan with phase-by-phase status 57 84 - `README.md`: rewritten; credits Greg Wilson's tidyblocks and QuantStack/jupyterlab-blockly 58 85
+223
docs/blocks-reference.md
··· 1 + # Block Reference 2 + 3 + Complete reference for all blocks available in the **Tidy Data** toolbox. 4 + Blocks are organized by category, matching the sidebar in the Blockly editor. 5 + 6 + Block names follow [dplyr (tidyverse)](https://dplyr.tidyverse.org/) conventions 7 + where an equivalent verb exists. The **dplyr equivalent** column shows the R 8 + function that inspired the block; a dash (—) means there is no direct 9 + dplyr analogue. 10 + 11 + --- 12 + 13 + ## Data `#FEBE4C` 14 + 15 + Source blocks start a pipeline. They create a DataFrame stored in `_df` and 16 + have no top connector (nothing chains into them). 17 + 18 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 19 + |---|---|---|---|---| 20 + | penguins dataset | `tidyblocks_data_penguins` | — | Palmer Penguins dataset loaded via seaborn | `_df = sns.load_dataset('penguins')` | 21 + | colors dataset | `tidyblocks_data_colors` | — | Built-in table of 11 colors with RGB values | `_df = pd.DataFrame({...})` | 22 + | earthquakes dataset | `tidyblocks_data_earthquakes` | — | 2016 global earthquake data from gvwilson/tidyblocks | `_df = pd.read_csv('<url>')` | 23 + | sequence 1 to N as col | `tidyblocks_data_sequence` | — | Integer sequence 1..N in a named column | `_df = pd.DataFrame({'col': range(1, N+1)})` | 24 + | dataset named name | `tidyblocks_data_user` | — | Reference a DataFrame previously saved with **save as** | `_df = name.copy()` | 25 + | read CSV path | `tidyblocks_data_csv` | — | Load a CSV file from a local or remote path | `_df = pd.read_csv('path')` | 26 + 27 + --- 28 + 29 + ## Transform `#76AADB` 30 + 31 + Transform blocks read from and write back to `_df`. They can be chained in 32 + any order between a source block and a terminal block. 33 + 34 + ### Row operations 35 + 36 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 37 + |---|---|---|---|---| 38 + | filter where cond | `tidyblocks_transform_filter` | `filter()` | Keep only rows where the condition is `True` | `_df = _df[cond]` | 39 + | arrange by cols ↑↓ | `tidyblocks_transform_arrange` | `arrange()` | Sort rows by one or more columns, ascending or descending | `_df = _df.sort_values(by=[...], ascending=True/False)` | 40 + | distinct by cols | `tidyblocks_transform_distinct` | `distinct()` | Remove duplicate rows, keeping one per unique combination of columns | `_df = _df.drop_duplicates(subset=[...])` | 41 + | slice_head N rows | `tidyblocks_transform_slice_head` | `slice_head()` | Keep the first N rows | `_df = _df.head(N)` | 42 + | slice_tail N rows | `tidyblocks_transform_slice_tail` | `slice_tail()` | Keep the last N rows | `_df = _df.tail(N)` | 43 + | slice_sample N rows | `tidyblocks_transform_slice_sample` | `slice_sample()` | Randomly sample N rows | `_df = _df.sample(n=N)` | 44 + | slice_min N rows by col | `tidyblocks_transform_slice_min` | `slice_min()` | Keep the N rows with the smallest values in a column | `_df = _df.nsmallest(N, 'col')` | 45 + | slice_max N rows by col | `tidyblocks_transform_slice_max` | `slice_max()` | Keep the N rows with the largest values in a column | `_df = _df.nlargest(N, 'col')` | 46 + | drop rows with missing in cols | `tidyblocks_transform_dropna` | — (`tidyr::drop_na`) | Remove rows that have missing values in the specified columns | `_df = _df.dropna(subset=[...])` | 47 + 48 + ### Column operations 49 + 50 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 51 + |---|---|---|---|---| 52 + | select columns cols | `tidyblocks_transform_select` | `select()` | Keep only the named columns | `_df = _df[[...]]` | 53 + | drop columns cols | `tidyblocks_transform_drop` | `select(-col)` | Remove the named columns | `_df = _df.drop(columns=[...])` | 54 + | mutate col = expr | `tidyblocks_transform_mutate` | `mutate()` | Add a new column or overwrite an existing one with an expression | `_df = _df.assign(**{'col': expr})` | 55 + | rename old to new | `tidyblocks_transform_rename` | `rename()` | Rename a single column | `_df = _df.rename(columns={'old': 'new'})` | 56 + | relocate cols before/after anchor | `tidyblocks_transform_relocate` | `relocate()` | Move one or more columns to a new position relative to an anchor column | reorders `_df.columns` | 57 + | fill missing in col with val | `tidyblocks_transform_fillna` | — (`tidyr::replace_na`) | Replace missing values in a column with a given value | `_df = _df.assign(**{'col': _df['col'].fillna(val)})` | 58 + 59 + ### Grouping & aggregation 60 + 61 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 62 + |---|---|---|---|---| 63 + | group by cols | `tidyblocks_transform_groupby` | `group_by()` | Group rows by the values in one or more columns for use with summarize or running | `_df = _df.groupby([...], as_index=False)` | 64 + | ungroup | `tidyblocks_transform_ungroup` | `ungroup()` | Remove grouping and reset the row index | `_df = _df.reset_index(drop=True)` | 65 + | summarize fn of col as result | `tidyblocks_transform_summarize` | `summarize()` | Aggregate each group (or the whole DataFrame) to a single row using count / sum / mean / median / min / max / std / var / n distinct / any / all | `_df = _df.agg(**{'result': ('col', 'fn')}).reset_index()` | 66 + | count by cols | `tidyblocks_transform_count` | `count()` | Count rows for each unique combination of the specified columns | `_df = _df.groupby([...], as_index=False).size().rename(columns={'size': 'n'})` | 67 + | running fn of col as result | `tidyblocks_transform_running` | — (window fns) | Compute a cumulative operation (cumsum / cummax / cummin / cummean / row index) across rows | `_df = _df.assign(**{'result': _df['col'].cumsum()})` etc. | 68 + 69 + ### Utilities 70 + 71 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 72 + |---|---|---|---|---| 73 + | bin col into N buckets as result | `tidyblocks_transform_bin` | — (`cut()`) | Discretize a numeric column into N equal-width interval buckets | `_df = _df.assign(**{'result': pd.cut(_df['col'], bins=N).astype(str)})` | 74 + | save as name | `tidyblocks_transform_saveas` | — | Copy the current DataFrame into a named Python variable for later use with **dataset named** | `name = _df.copy()` | 75 + | display table | `tidyblocks_transform_display` | — | Render the current DataFrame as an HTML table in the output cell | `display(_df)` | 76 + 77 + --- 78 + 79 + ## Combine `#808080` 80 + 81 + Combine blocks merge the current `_df` with a second DataFrame that was 82 + previously saved using **save as**. 83 + 84 + ### Mutating joins (add columns from the right table) 85 + 86 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 87 + |---|---|---|---|---| 88 + | inner/left/right/outer join other on left col = right col | `tidyblocks_combine_join` | `inner_join()` / `left_join()` / `right_join()` / `full_join()` | Join two DataFrames on matching key columns. Choose inner (only matching rows), left (all left rows), right (all right rows), or outer (all rows from both) | `_df = pd.merge(_df, other, left_on='lk', right_on='rk', how='...')` | 89 + | cross join with other | `tidyblocks_combine_cross_join` | `cross_join()` | Cartesian product — every row in `_df` paired with every row in `other` | `_df = _df.merge(other, how='cross')` | 90 + 91 + ### Filtering joins (keep/remove rows based on a match, no new columns) 92 + 93 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 94 + |---|---|---|---|---| 95 + | semi join other on left col = right col | `tidyblocks_combine_semi_join` | `semi_join()` | Keep only rows in `_df` that have a matching key in `other`. No columns from `other` are added. | `_df = _df[_df['lk'].isin(other['rk'])]` | 96 + | anti join other on left col = right col | `tidyblocks_combine_anti_join` | `anti_join()` | Keep only rows in `_df` that have **no** matching key in `other` | `_df = _df[~_df['lk'].isin(other['rk'])]` | 97 + 98 + ### Binding (stack or glue tables together) 99 + 100 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 101 + |---|---|---|---|---| 102 + | bind_rows with other label column src | `tidyblocks_combine_bind_rows` | `bind_rows()` | Vertically stack `_df` on top of `other`, adding a label column to identify the source of each row | `_df = pd.concat([_df.assign(src='left'), other.assign(src='right')]).reset_index(drop=True)` | 103 + | bind_cols with other | `tidyblocks_combine_bind_cols` | `bind_cols()` | Horizontally bind `_df` and `other` by column position. Both tables must have the same number of rows. | `_df = pd.concat([_df, other], axis=1)` | 104 + 105 + --- 106 + 107 + ## Plot `#A4C588` 108 + 109 + Plot blocks are terminal — they render a chart and have no bottom connector. 110 + All plots use [Plotly Express](https://plotly.com/python/plotly-express/). 111 + 112 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 113 + |---|---|---|---|---| 114 + | bar chart x col y col | `tidyblocks_plot_bar` | — | Vertical bar chart | `px.bar(_df, x='col', y='col')` | 115 + | box plot x col y col | `tidyblocks_plot_box` | — | Box-and-whisker plot showing median, IQR, and outliers | `px.box(_df, x='col', y='col')` | 116 + | dot plot x col | `tidyblocks_plot_dot` | — | Strip/dot plot — one point per row along an axis | `px.strip(_df, x='col')` | 117 + | histogram of col bins N | `tidyblocks_plot_histogram` | — | Frequency histogram with N bins | `px.histogram(_df, x='col', nbins=N)` | 118 + | scatter plot x col y col color col trendline ☐ | `tidyblocks_plot_scatter` | — | Scatter plot with optional color grouping and OLS trendline | `px.scatter(_df, x=..., y=..., color=..., trendline=...)` | 119 + | line chart x col y col color col | `tidyblocks_plot_line` | — | Line chart with optional color grouping | `px.line(_df, x=..., y=..., color=...)` | 120 + | violin plot x col y col | `tidyblocks_plot_violin` | — | Violin plot showing the distribution shape | `px.violin(_df, x='col', y='col')` | 121 + | correlation heatmap | `tidyblocks_plot_heatmap` | — | Heatmap of pairwise Pearson correlations between all numeric columns | `px.imshow(_df.corr())` | 122 + 123 + --- 124 + 125 + ## Stats `#BA93DB` 126 + 127 + Stats blocks are terminal — they print results and have no bottom connector. 128 + All stats use [scipy.stats](https://docs.scipy.org/doc/scipy/reference/stats.html) 129 + and [scikit-learn](https://scikit-learn.org/). 130 + 131 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 132 + |---|---|---|---|---| 133 + | one-sample t-test column col vs mean μ | `tidyblocks_stats_ttest_one` | — | Two-sided one-sample t-test: tests whether the column mean equals μ. Prints t-statistic and p-value. | `stats.ttest_1samp(_df['col'], μ)` | 134 + | two-sample t-test groups in group_col values in val_col | `tidyblocks_stats_ttest_two` | — | Two-sided two-sample t-test: splits rows into two groups and tests whether their means differ | `stats.ttest_ind(group_a, group_b)` | 135 + | k-means x col y col k N label col | `tidyblocks_stats_kmeans` | — | K-means clustering on two columns; adds a cluster label column to `_df` | `KMeans(n_clusters=N).fit_predict(...)` | 136 + | silhouette score x col y col labels col score col | `tidyblocks_stats_silhouette` | — | Computes the silhouette coefficient for existing cluster labels; adds a score column | `silhouette_score(X, labels)` | 137 + | Pearson/Spearman/Kendall correlation of col_a and col_b | `tidyblocks_stats_correlation` | — | Computes pairwise correlation coefficient and p-value between two columns | `stats.pearsonr / spearmanr / kendalltau` | 138 + | describe | `tidyblocks_stats_describe` | — | Prints `DataFrame.describe()` — count, mean, std, min, quartiles, max for every numeric column | `display(_df.describe())` | 139 + 140 + --- 141 + 142 + ## Values `#E7553C` 143 + 144 + Value blocks are expression blocks — they produce a value and connect into 145 + input slots on transform or operation blocks. They do not have statement 146 + connectors. 147 + 148 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 149 + |---|---|---|---|---| 150 + | column col | `tidyblocks_value_column` | — | Reference a DataFrame column by name | `_df['col']` | 151 + | number | `tidyblocks_value_number` | — | A numeric literal | `0`, `3.14`, etc. | 152 + | "text" | `tidyblocks_value_text` | — | A string literal | `'text'` | 153 + | true / false | `tidyblocks_value_logical` | — | A boolean literal | `True` / `False` | 154 + | date YYYY-MM-DD | `tidyblocks_value_datetime` | — | A date/time constant | `pd.Timestamp('YYYY-MM-DD')` | 155 + | missing | `tidyblocks_value_missing` | `NA` | An explicit missing (NA/NaN) value | `float("nan")` | 156 + | Normal(mean μ std σ) | `tidyblocks_value_normal` | — | Draw a column of values from a Normal distribution | `np.random.normal(μ, σ, len(_df))` | 157 + | Uniform(low a high b) | `tidyblocks_value_uniform` | — | Draw a column of values from a Uniform distribution | `np.random.uniform(a, b, len(_df))` | 158 + | Exponential(lambda λ) | `tidyblocks_value_exponential` | — | Draw a column of values from an Exponential distribution | `np.random.exponential(1/λ, len(_df))` | 159 + 160 + --- 161 + 162 + ## Operations `#F9B5B2` 163 + 164 + Operation blocks are expression blocks used inside **filter**, **mutate**, 165 + **fill missing**, and similar blocks. They take value inputs and return a 166 + computed value. 167 + 168 + ### Numeric & comparison 169 + 170 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 171 + |---|---|---|---|---| 172 + | a + b, a - b, a × b, a ÷ b, a % b, a ** b | `tidyblocks_op_arithmetic` | — | Standard arithmetic on two values | `(a + b)`, `(a * b)`, etc. | 173 + | a = b, a ≠ b, a < b, a ≤ b, a > b, a ≥ b | `tidyblocks_op_compare` | — | Element-wise comparison, returning a boolean column | `(a == b)`, `(a < b)`, etc. | 174 + | x between left and right | `tidyblocks_op_between` | `between()` | Return `True` for values within the inclusive range `[left, right]` | `x.between(left, right)` | 175 + | abs / round / floor / ceil / sqrt / log / exp ( val ) | `tidyblocks_op_math` | — | Apply a standard math function to a column | `val.abs()`, `np.sqrt(val)`, etc. | 176 + 177 + ### Logic 178 + 179 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 180 + |---|---|---|---|---| 181 + | a AND b / a OR b | `tidyblocks_op_logic` | — | Element-wise logical AND/OR on two boolean columns | `(a & b)`, `(a \| b)` | 182 + | NOT val | `tidyblocks_op_not` | — | Element-wise logical NOT | `~(val)` | 183 + | if cond then x else y | `tidyblocks_op_ifelse` | `if_else()` | Return `x` where `cond` is `True`, `y` elsewhere | `np.where(cond, x, y)` | 184 + | coalesce val with replacement | `tidyblocks_op_coalesce` | `coalesce()` | Replace missing values in `val` with values from `replacement` | `val.fillna(replacement)` | 185 + 186 + ### Type operations 187 + 188 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 189 + |---|---|---|---|---| 190 + | val is missing / is number / is text / is date / is boolean | `tidyblocks_op_typecheck` | — | Test whether each element matches a specific type | `val.isna()`, `val.apply(isinstance(...))`, etc. | 191 + | convert val to number / text / bool / datetime | `tidyblocks_op_convert` | — | Cast a column to a different type | `pd.to_numeric(val)`, `val.astype(str)`, etc. | 192 + 193 + ### Date & time 194 + 195 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 196 + |---|---|---|---|---| 197 + | year / month / day / weekday / hour / minute / second of val | `tidyblocks_op_datetime` | — | Extract a calendar component from a datetime column | `val.dt.year`, `val.dt.month`, etc. | 198 + 199 + ### Window & ranking 200 + 201 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 202 + |---|---|---|---|---| 203 + | shift val by N | `tidyblocks_op_shift` | `lag()` / `lead()` | Shift values forward (positive N = lag) or backward (negative N = lead) | `val.shift(N)` | 204 + | n_distinct val | `tidyblocks_op_n_distinct` | `n_distinct()` | Count the number of distinct (unique) values in a column | `val.nunique()` | 205 + 206 + ### String 207 + 208 + | Block label | Block type | dplyr equivalent | What it does | Python generated | 209 + |---|---|---|---|---| 210 + | val . upper / lower / strip / length | `tidyblocks_op_string` | — | Apply a string operation to a text column | `val.str.upper()`, `val.str.len()`, etc. | 211 + | val contains pattern | `tidyblocks_op_str_contains` | `stringr::str_detect()` | Return `True` where the string column matches a pattern | `val.str.contains('pattern', na=False)` | 212 + 213 + --- 214 + 215 + ## Pipeline rules 216 + 217 + | Block role | Has top connector | Has bottom connector | Examples | 218 + |---|---|---|---| 219 + | **Source** | No | Yes | all Data blocks | 220 + | **Transform** | Yes | Yes | filter, mutate, arrange, … | 221 + | **Terminal** | Yes | No | display table, all Plot blocks, all Stats blocks | 222 + 223 + A valid pipeline must be: **one source → zero or more transforms → one terminal**.
+76 -2
docs/work-summary.md
··· 211 211 212 212 --- 213 213 214 - ## 11. Documentation 214 + ## 12. npm migration 215 + 216 + **Problem:** The project used Yarn 4 (`packageManager: "yarn@4.6.0"`) but a 217 + stray `package-lock.json` and npm-installed `node_modules` had accumulated 218 + alongside it, creating an inconsistent state. 219 + 220 + **What was done:** 221 + - Removed `packageManager: "yarn@4.6.0"` and replaced with `"npm@11.1.0"`. 222 + - Converted `"resolutions"` → `"overrides"` (npm 8.3+ equivalent). 223 + - Converted `"workspaces"` from Yarn's `{ "packages": [...] }` object form to 224 + npm's array form `["packages/*"]`. 225 + - Replaced all `jlpm` references in `packages/blockly-extension/package.json` 226 + scripts with `npm run`. 227 + - Replaced `jlpm` references in root `lint` / `prettier` scripts with 228 + `npm run`. 229 + - Deleted `.yarnrc.yml`, `.yarn/` cache directory, and `yarn.lock`. 230 + - Updated `.gitignore` to track `node_modules/`, `package-lock.json`, and 231 + `yarn.lock` (the last is a build side-effect from `@jupyterlab/builder`'s 232 + bundled `jlpm`, which cannot be avoided). 233 + 234 + **Note:** `jlpm` (a yarn shim bundled inside the `jupyterlab` Python package) 235 + is called internally by `jupyter labextension build` and will always 236 + regenerate `yarn.lock` during a build. This is an implementation detail of 237 + `@jupyterlab/builder` that cannot be configured away; the file is gitignored. 238 + 239 + --- 240 + 241 + ## 13. dplyr alignment and new blocks 242 + 243 + **Motivation:** dplyr (R tidyverse) is the reference vocabulary for tidy-data 244 + analysis. Aligning block names to dplyr verbs makes the extension more 245 + intuitive for data scientists familiar with either R or the tidy-data 246 + paradigm. 247 + 248 + ### Renames (7 blocks) 249 + 250 + | Old block label | New block label | dplyr verb | 251 + |---|---|---| 252 + | `create column` | `mutate` | `mutate()` | 253 + | `sort by` | `arrange by` | `arrange()` | 254 + | `unique by` | `distinct by` | `distinct()` | 255 + | `first N rows` | `slice_head N rows` | `slice_head()` | 256 + | `last N rows` | `slice_tail N rows` | `slice_tail()` | 257 + | `sample N rows` | `slice_sample N rows` | `slice_sample()` | 258 + | `glue with` | `bind_rows with` | `bind_rows()` | 259 + 260 + Internal block type names were updated to match 261 + (e.g. `tidyblocks_transform_create` → `tidyblocks_transform_mutate`). 262 + 263 + ### New blocks (10 blocks) 264 + 265 + **Transform** 266 + - `count by cols` — `count()`: count rows per combination of columns 267 + - `relocate cols before/after anchor` — `relocate()`: move columns to a new position 268 + - `slice_min N rows by col` — `slice_min()`: keep N rows with smallest values 269 + - `slice_max N rows by col` — `slice_max()`: keep N rows with largest values 270 + 271 + **Combine** 272 + - `semi join` — `semi_join()`: filtering join, keep matched rows (no new columns) 273 + - `anti join` — `anti_join()`: filtering join, keep unmatched rows 274 + - `bind_cols with` — `bind_cols()`: horizontally bind two DataFrames by column position 275 + 276 + **Operations** 277 + - `between left and right` — `between()`: inclusive range check 278 + - `coalesce val with replacement` — `coalesce()`: first non-missing value 279 + - `n_distinct val` — `n_distinct()`: count unique values 280 + 281 + Also added `n distinct` as an option to the **summarize** block's function dropdown. 282 + 283 + --- 284 + 285 + ## 14. Documentation 286 + 287 + (was §11) 215 288 216 289 | Document | Description | 217 290 |---|---| 218 291 | `docs/getting-started.md` | Step-by-step guide: install, launch JupyterLab, create a `.jpblockly` file, build a penguins pipeline, run it, and see output. | 219 292 | `docs/modernization-plan.md` | Updated to reflect completed phases, corrected version numbers (JupyterLab 4.5 not 4.6), and added a new Phase 6 documenting all the fixes in this work. | 220 - | `docs/architecture.md` *(this work)* | Full architecture reference: package layout, data-flow diagram, object relationships, block pipeline conventions, code generation pattern, and extension points. | 293 + | `docs/architecture.md` | Full architecture reference: package layout, data-flow diagram, object relationships, block pipeline conventions, code generation pattern, and extension points. | 294 + | `docs/blocks-reference.md` | Complete block reference: every block organized by category, with its block type name, dplyr equivalent, description, and generated Python. | 221 295 | `CHANGELOG.md` | Rewrote the `0.1.0` entry with accurate versions and full sections covering all new features, the rebrand, dependency upgrades, build fixes, bug fixes, and docs. |
+1 -1
jupyter_tidyblocks/labextension/package.json
··· 98 98 } 99 99 }, 100 100 "_build": { 101 - "load": "static/remoteEntry.898fe8e70b536ba95137.js", 101 + "load": "static/remoteEntry.a2e6fa6b678931659f31.js", 102 102 "extension": "./extension", 103 103 "style": "./style" 104 104 }
+3 -3
packages/blockly-extension/src/index.ts
··· 111 111 // Handle state restoration. 112 112 if (restorer) { 113 113 // When restoring the app, if the document was open, reopen it 114 - restorer.restore(tracker, { 114 + restorer.restore(tracker as any, { 115 115 command: 'docmanager:open', 116 - args: widget => ({ path: widget.context.path, factory: FACTORY }), 117 - name: widget => widget.context.path 116 + args: (widget: any) => ({ path: widget.context.path, factory: FACTORY }), 117 + name: (widget: any) => widget.context.path 118 118 }); 119 119 } 120 120
+47 -2
packages/tidyblocks/src/blocks/combine.ts
··· 9 9 10 10 Blockly.defineBlocksWithJsonArray([ 11 11 { 12 + // dplyr: inner_join / left_join / right_join / full_join 12 13 type: 'tidyblocks_combine_join', 13 14 message0: '%1 join %2 on left %3 = right %4', 14 15 args0: [ ··· 28 29 'Join the current DataFrame with a named DataFrame on matching columns.' 29 30 }, 30 31 { 31 - type: 'tidyblocks_combine_glue', 32 - message0: 'glue with %1 label column %2', 32 + // dplyr: bind_rows() — vertically stack two DataFrames 33 + type: 'tidyblocks_combine_bind_rows', 34 + message0: 'bind_rows with %1 label column %2', 33 35 args0: [ 34 36 { type: 'field_input', name: 'OTHER_DF', text: 'other_df' }, 35 37 { type: 'field_input', name: 'LABEL_COL', text: 'source' } ··· 41 43 'Vertically stack the current DataFrame with another, adding a label column.' 42 44 }, 43 45 { 46 + // dplyr: cross_join() — Cartesian product 44 47 type: 'tidyblocks_combine_cross_join', 45 48 message0: 'cross join with %1', 46 49 args0: [{ type: 'field_input', name: 'RIGHT_DF', text: 'other_df' }], ··· 48 51 nextStatement: null, 49 52 colour: '#808080', 50 53 tooltip: 'Cartesian product of the current DataFrame with another.' 54 + }, 55 + { 56 + // dplyr: semi_join() — keep rows in _df that have a match in other_df 57 + // (no columns from other_df are added) 58 + type: 'tidyblocks_combine_semi_join', 59 + message0: 'semi join %1 on left %2 = right %3', 60 + args0: [ 61 + { type: 'field_input', name: 'RIGHT_DF', text: 'other_df' }, 62 + { type: 'field_input', name: 'LEFT_ON', text: 'id' }, 63 + { type: 'field_input', name: 'RIGHT_ON', text: 'id' } 64 + ], 65 + previousStatement: null, 66 + nextStatement: null, 67 + colour: '#808080', 68 + tooltip: 69 + 'Keep only rows from the current DataFrame that have a match in the other DataFrame. No columns from the other DataFrame are added.' 70 + }, 71 + { 72 + // dplyr: anti_join() — keep rows in _df that have NO match in other_df 73 + type: 'tidyblocks_combine_anti_join', 74 + message0: 'anti join %1 on left %2 = right %3', 75 + args0: [ 76 + { type: 'field_input', name: 'RIGHT_DF', text: 'other_df' }, 77 + { type: 'field_input', name: 'LEFT_ON', text: 'id' }, 78 + { type: 'field_input', name: 'RIGHT_ON', text: 'id' } 79 + ], 80 + previousStatement: null, 81 + nextStatement: null, 82 + colour: '#808080', 83 + tooltip: 84 + 'Keep only rows from the current DataFrame that have no match in the other DataFrame.' 85 + }, 86 + { 87 + // dplyr: bind_cols() — horizontally bind two DataFrames by column position 88 + type: 'tidyblocks_combine_bind_cols', 89 + message0: 'bind_cols with %1', 90 + args0: [{ type: 'field_input', name: 'OTHER_DF', text: 'other_df' }], 91 + previousStatement: null, 92 + nextStatement: null, 93 + colour: '#808080', 94 + tooltip: 95 + 'Horizontally bind the current DataFrame with another by column position. Both must have the same number of rows.' 51 96 } 52 97 ]);
+37
packages/tidyblocks/src/blocks/op.ts
··· 223 223 colour: '#F9B5B2', 224 224 inputsInline: true, 225 225 tooltip: 'Check whether a string column contains a pattern.' 226 + }, 227 + // dplyr: between() — test whether values fall within an inclusive range 228 + { 229 + type: 'tidyblocks_op_between', 230 + message0: '%1 between %2 and %3', 231 + args0: [ 232 + { type: 'input_value', name: 'VALUE' }, 233 + { type: 'field_number', name: 'LEFT', value: 0 }, 234 + { type: 'field_number', name: 'RIGHT', value: 1 } 235 + ], 236 + output: 'Boolean', 237 + colour: '#F9B5B2', 238 + inputsInline: true, 239 + tooltip: 'Return True for values within the inclusive range [left, right].' 240 + }, 241 + // dplyr: coalesce() — return the first non-missing value across columns 242 + { 243 + type: 'tidyblocks_op_coalesce', 244 + message0: 'coalesce %1 with %2', 245 + args0: [ 246 + { type: 'input_value', name: 'VALUE' }, 247 + { type: 'input_value', name: 'REPLACEMENT' } 248 + ], 249 + output: null, 250 + colour: '#F9B5B2', 251 + inputsInline: true, 252 + tooltip: 'Replace missing values in a column with values from another column or expression.' 253 + }, 254 + // dplyr: n_distinct() — count of unique values in a column 255 + { 256 + type: 'tidyblocks_op_n_distinct', 257 + message0: 'n_distinct %1', 258 + args0: [{ type: 'input_value', name: 'VALUE' }], 259 + output: 'Number', 260 + colour: '#F9B5B2', 261 + inputsInline: true, 262 + tooltip: 'Count the number of distinct (unique) values in a column.' 226 263 } 227 264 ]);
+81 -12
packages/tidyblocks/src/blocks/transform.ts
··· 9 9 ['max', 'max'], 10 10 ['std dev', 'std'], 11 11 ['variance', 'var'], 12 + ['n distinct', 'nunique'], 12 13 ['any', 'any'], 13 14 ['all', 'all'] 14 15 ]; ··· 32 33 tooltip: 'Keep only rows matching a condition.' 33 34 }, 34 35 { 36 + // dplyr: select() — keep named columns 35 37 type: 'tidyblocks_transform_select', 36 38 message0: 'select columns %1', 37 39 args0: [{ type: 'field_input', name: 'COLUMNS', text: 'col1, col2' }], ··· 50 52 tooltip: 'Remove the specified columns (comma-separated).' 51 53 }, 52 54 { 53 - type: 'tidyblocks_transform_create', 54 - message0: 'create column %1 as %2', 55 + // dplyr: mutate() — create or overwrite a column 56 + type: 'tidyblocks_transform_mutate', 57 + message0: 'mutate %1 = %2', 55 58 args0: [ 56 59 { type: 'field_input', name: 'COLUMN', text: 'new_col' }, 57 60 { type: 'input_value', name: 'EXPRESSION' } ··· 62 65 tooltip: 'Add or replace a column using an expression.' 63 66 }, 64 67 { 68 + // dplyr: rename() — rename a single column 65 69 type: 'tidyblocks_transform_rename', 66 70 message0: 'rename %1 to %2', 67 71 args0: [ ··· 74 78 tooltip: 'Rename a column.' 75 79 }, 76 80 { 77 - type: 'tidyblocks_transform_sort', 78 - message0: 'sort by %1 %2', 81 + // dplyr: arrange() — order rows by column values 82 + type: 'tidyblocks_transform_arrange', 83 + message0: 'arrange by %1 %2', 79 84 args0: [ 80 85 { type: 'field_input', name: 'COLUMNS', text: 'col1' }, 81 86 { ··· 93 98 tooltip: 'Sort rows by one or more columns (comma-separated).' 94 99 }, 95 100 { 96 - type: 'tidyblocks_transform_unique', 97 - message0: 'unique by %1', 101 + // dplyr: distinct() — keep unique rows 102 + type: 'tidyblocks_transform_distinct', 103 + message0: 'distinct by %1', 98 104 args0: [{ type: 'field_input', name: 'COLUMNS', text: 'col1' }], 99 105 previousStatement: null, 100 106 nextStatement: null, ··· 102 108 tooltip: 'Keep only rows with distinct values in the specified columns.' 103 109 }, 104 110 { 111 + // dplyr: group_by() — group rows by column values 105 112 type: 'tidyblocks_transform_groupby', 106 113 message0: 'group by %1', 107 114 args0: [{ type: 'field_input', name: 'COLUMNS', text: 'col1' }], ··· 111 118 tooltip: 'Group rows by column values for subsequent summarize or running.' 112 119 }, 113 120 { 121 + // dplyr: ungroup() — remove grouping 114 122 type: 'tidyblocks_transform_ungroup', 115 123 message0: 'ungroup', 116 124 previousStatement: null, ··· 119 127 tooltip: 'Remove grouping and reset the index.' 120 128 }, 121 129 { 130 + // dplyr: summarize() — aggregate groups to one row each 122 131 type: 'tidyblocks_transform_summarize', 123 132 message0: 'summarize %1 of %2 as %3', 124 133 args0: [ ··· 196 205 tooltip: 'Drop rows that have missing values in the specified columns.' 197 206 }, 198 207 { 199 - type: 'tidyblocks_transform_sample', 200 - message0: 'sample %1 rows', 208 + // dplyr: slice_sample() — random sample of N rows 209 + type: 'tidyblocks_transform_slice_sample', 210 + message0: 'slice_sample %1 rows', 201 211 args0: [ 202 212 { type: 'field_number', name: 'N', value: 10, min: 1, precision: 1 } 203 213 ], ··· 207 217 tooltip: 'Randomly sample N rows from the DataFrame.' 208 218 }, 209 219 { 210 - type: 'tidyblocks_transform_head', 211 - message0: 'first %1 rows', 220 + // dplyr: slice_head() — first N rows 221 + type: 'tidyblocks_transform_slice_head', 222 + message0: 'slice_head %1 rows', 212 223 args0: [ 213 224 { type: 'field_number', name: 'N', value: 10, min: 1, precision: 1 } 214 225 ], ··· 218 229 tooltip: 'Keep only the first N rows.' 219 230 }, 220 231 { 221 - type: 'tidyblocks_transform_tail', 222 - message0: 'last %1 rows', 232 + // dplyr: slice_tail() — last N rows 233 + type: 'tidyblocks_transform_slice_tail', 234 + message0: 'slice_tail %1 rows', 223 235 args0: [ 224 236 { type: 'field_number', name: 'N', value: 10, min: 1, precision: 1 } 225 237 ], ··· 227 239 nextStatement: null, 228 240 colour: '#76AADB', 229 241 tooltip: 'Keep only the last N rows.' 242 + }, 243 + { 244 + // dplyr: slice_min() — N rows with smallest values in a column 245 + type: 'tidyblocks_transform_slice_min', 246 + message0: 'slice_min %1 rows by %2', 247 + args0: [ 248 + { type: 'field_number', name: 'N', value: 5, min: 1, precision: 1 }, 249 + { type: 'field_input', name: 'COLUMN', text: 'col1' } 250 + ], 251 + previousStatement: null, 252 + nextStatement: null, 253 + colour: '#76AADB', 254 + tooltip: 'Keep the N rows with the smallest values in a column.' 255 + }, 256 + { 257 + // dplyr: slice_max() — N rows with largest values in a column 258 + type: 'tidyblocks_transform_slice_max', 259 + message0: 'slice_max %1 rows by %2', 260 + args0: [ 261 + { type: 'field_number', name: 'N', value: 5, min: 1, precision: 1 }, 262 + { type: 'field_input', name: 'COLUMN', text: 'col1' } 263 + ], 264 + previousStatement: null, 265 + nextStatement: null, 266 + colour: '#76AADB', 267 + tooltip: 'Keep the N rows with the largest values in a column.' 268 + }, 269 + { 270 + // dplyr: count() — count rows per group (or total if ungrouped) 271 + type: 'tidyblocks_transform_count', 272 + message0: 'count by %1', 273 + args0: [{ type: 'field_input', name: 'COLUMNS', text: 'col1' }], 274 + previousStatement: null, 275 + nextStatement: null, 276 + colour: '#76AADB', 277 + tooltip: 'Count rows for each combination of the specified columns.' 278 + }, 279 + { 280 + // dplyr: relocate() — move column(s) to before or after another column 281 + type: 'tidyblocks_transform_relocate', 282 + message0: 'relocate %1 %2 %3', 283 + args0: [ 284 + { type: 'field_input', name: 'COLUMNS', text: 'col1' }, 285 + { 286 + type: 'field_dropdown', 287 + name: 'POSITION', 288 + options: [ 289 + ['before', 'before'], 290 + ['after', 'after'] 291 + ] 292 + }, 293 + { type: 'field_input', name: 'ANCHOR', text: 'col2' } 294 + ], 295 + previousStatement: null, 296 + nextStatement: null, 297 + colour: '#76AADB', 298 + tooltip: 'Move one or more columns to before or after a reference column.' 230 299 }, 231 300 { 232 301 type: 'tidyblocks_transform_display',
+31 -1
packages/tidyblocks/src/generators/python/combine.ts
··· 1 1 import { pythonGenerator, Order } from 'blockly/python'; 2 2 3 + // dplyr: inner_join / left_join / right_join / full_join 3 4 pythonGenerator.forBlock['tidyblocks_combine_join'] = block => { 4 5 const how = block.getFieldValue('HOW'); 5 6 const rightDf = block.getFieldValue('RIGHT_DF'); ··· 11 12 ); 12 13 }; 13 14 14 - pythonGenerator.forBlock['tidyblocks_combine_glue'] = block => { 15 + // dplyr: bind_rows() — vertically stack with a label column 16 + pythonGenerator.forBlock['tidyblocks_combine_bind_rows'] = block => { 15 17 const otherDf = block.getFieldValue('OTHER_DF'); 16 18 const labelCol = block.getFieldValue('LABEL_COL'); 17 19 return ( ··· 22 24 ); 23 25 }; 24 26 27 + // dplyr: cross_join() — Cartesian product 25 28 pythonGenerator.forBlock['tidyblocks_combine_cross_join'] = block => { 26 29 const rightDf = block.getFieldValue('RIGHT_DF'); 27 30 return `_df = _df.merge(${rightDf}, how='cross')\n`; 31 + }; 32 + 33 + // dplyr: semi_join() — filtering join; keep rows that have a match 34 + // pandas has no native semi_join, so we use merge + filtering 35 + pythonGenerator.forBlock['tidyblocks_combine_semi_join'] = block => { 36 + const rightDf = block.getFieldValue('RIGHT_DF'); 37 + const leftOn = block.getFieldValue('LEFT_ON'); 38 + const rightOn = block.getFieldValue('RIGHT_ON'); 39 + return ( 40 + `_df = _df[_df['${leftOn}'].isin(${rightDf}['${rightOn}'])]\n` 41 + ); 42 + }; 43 + 44 + // dplyr: anti_join() — filtering join; keep rows that have no match 45 + pythonGenerator.forBlock['tidyblocks_combine_anti_join'] = block => { 46 + const rightDf = block.getFieldValue('RIGHT_DF'); 47 + const leftOn = block.getFieldValue('LEFT_ON'); 48 + const rightOn = block.getFieldValue('RIGHT_ON'); 49 + return ( 50 + `_df = _df[~_df['${leftOn}'].isin(${rightDf}['${rightOn}'])]\n` 51 + ); 52 + }; 53 + 54 + // dplyr: bind_cols() — horizontally bind by column position 55 + pythonGenerator.forBlock['tidyblocks_combine_bind_cols'] = block => { 56 + const otherDf = block.getFieldValue('OTHER_DF'); 57 + return `_df = pd.concat([_df.reset_index(drop=True), ${otherDf}.reset_index(drop=True)], axis=1)\n`; 28 58 }; 29 59 30 60 export { Order };
+21
packages/tidyblocks/src/generators/python/op.ts
··· 129 129 return [`(${val}).str.contains('${pattern}', na=False)`, Order.FUNCTION_CALL]; 130 130 }; 131 131 132 + // dplyr: between(x, left, right) — inclusive range check 133 + pythonGenerator.forBlock['tidyblocks_op_between'] = (block, generator) => { 134 + const val = generator.valueToCode(block, 'VALUE', Order.NONE) || '_df.iloc[:, 0]'; 135 + const left = block.getFieldValue('LEFT'); 136 + const right = block.getFieldValue('RIGHT'); 137 + return [`(${val}).between(${left}, ${right})`, Order.FUNCTION_CALL]; 138 + }; 139 + 140 + // dplyr: coalesce(x, y) — first non-missing value 141 + pythonGenerator.forBlock['tidyblocks_op_coalesce'] = (block, generator) => { 142 + const val = generator.valueToCode(block, 'VALUE', Order.NONE) || '_df.iloc[:, 0]'; 143 + const replacement = generator.valueToCode(block, 'REPLACEMENT', Order.NONE) || 'None'; 144 + return [`(${val}).fillna(${replacement})`, Order.FUNCTION_CALL]; 145 + }; 146 + 147 + // dplyr: n_distinct(x) — count unique values 148 + pythonGenerator.forBlock['tidyblocks_op_n_distinct'] = (block, generator) => { 149 + const val = generator.valueToCode(block, 'VALUE', Order.NONE) || '_df.iloc[:, 0]'; 150 + return [`(${val}).nunique()`, Order.FUNCTION_CALL]; 151 + }; 152 + 132 153 export { Order };
+47 -6
packages/tidyblocks/src/generators/python/transform.ts
··· 28 28 return `_df = _df.drop(columns=${cols})\n`; 29 29 }; 30 30 31 - pythonGenerator.forBlock['tidyblocks_transform_create'] = ( 31 + // dplyr: mutate() — create or overwrite a column 32 + pythonGenerator.forBlock['tidyblocks_transform_mutate'] = ( 32 33 block, 33 34 generator 34 35 ) => { ··· 38 39 return `_df = _df.assign(**{'${col}': ${expr}})\n`; 39 40 }; 40 41 42 + // dplyr: rename() — rename new_name = old_name 41 43 pythonGenerator.forBlock['tidyblocks_transform_rename'] = block => { 42 44 const oldName = block.getFieldValue('OLD_NAME'); 43 45 const newName = block.getFieldValue('NEW_NAME'); 44 46 return `_df = _df.rename(columns={'${oldName}': '${newName}'})\n`; 45 47 }; 46 48 47 - pythonGenerator.forBlock['tidyblocks_transform_sort'] = block => { 49 + // dplyr: arrange() — order rows by column values 50 + pythonGenerator.forBlock['tidyblocks_transform_arrange'] = block => { 48 51 const cols = toCols(block.getFieldValue('COLUMNS')); 49 52 const asc = block.getFieldValue('ORDER'); 50 53 return `_df = _df.sort_values(by=${cols}, ascending=${asc})\n`; 51 54 }; 52 55 53 - pythonGenerator.forBlock['tidyblocks_transform_unique'] = block => { 56 + // dplyr: distinct() — keep unique rows 57 + pythonGenerator.forBlock['tidyblocks_transform_distinct'] = block => { 54 58 const cols = toCols(block.getFieldValue('COLUMNS')); 55 59 return `_df = _df.drop_duplicates(subset=${cols})\n`; 56 60 }; ··· 111 115 return `_df = _df.dropna(subset=${cols})\n`; 112 116 }; 113 117 114 - pythonGenerator.forBlock['tidyblocks_transform_sample'] = block => { 118 + // dplyr: slice_sample() — random N rows 119 + pythonGenerator.forBlock['tidyblocks_transform_slice_sample'] = block => { 115 120 const n = block.getFieldValue('N'); 116 121 return `_df = _df.sample(n=${n})\n`; 117 122 }; 118 123 119 - pythonGenerator.forBlock['tidyblocks_transform_head'] = block => { 124 + // dplyr: slice_head() — first N rows 125 + pythonGenerator.forBlock['tidyblocks_transform_slice_head'] = block => { 120 126 const n = block.getFieldValue('N'); 121 127 return `_df = _df.head(${n})\n`; 122 128 }; 123 129 124 - pythonGenerator.forBlock['tidyblocks_transform_tail'] = block => { 130 + // dplyr: slice_tail() — last N rows 131 + pythonGenerator.forBlock['tidyblocks_transform_slice_tail'] = block => { 125 132 const n = block.getFieldValue('N'); 126 133 return `_df = _df.tail(${n})\n`; 134 + }; 135 + 136 + // dplyr: slice_min() — N rows with smallest values in a column 137 + pythonGenerator.forBlock['tidyblocks_transform_slice_min'] = block => { 138 + const n = block.getFieldValue('N'); 139 + const col = block.getFieldValue('COLUMN'); 140 + return `_df = _df.nsmallest(${n}, '${col}')\n`; 141 + }; 142 + 143 + // dplyr: slice_max() — N rows with largest values in a column 144 + pythonGenerator.forBlock['tidyblocks_transform_slice_max'] = block => { 145 + const n = block.getFieldValue('N'); 146 + const col = block.getFieldValue('COLUMN'); 147 + return `_df = _df.nlargest(${n}, '${col}')\n`; 148 + }; 149 + 150 + // dplyr: count() — count rows per combination of columns 151 + pythonGenerator.forBlock['tidyblocks_transform_count'] = block => { 152 + const cols = toCols(block.getFieldValue('COLUMNS')); 153 + return `_df = _df.groupby(${cols}, as_index=False).size().rename(columns={'size': 'n'})\n`; 154 + }; 155 + 156 + // dplyr: relocate() — move columns before or after a reference column 157 + pythonGenerator.forBlock['tidyblocks_transform_relocate'] = block => { 158 + const cols = block.getFieldValue('COLUMNS').split(',').map((c: string) => c.trim()); 159 + const position = block.getFieldValue('POSITION'); 160 + const anchor = block.getFieldValue('ANCHOR'); 161 + // Build the new column order by inserting cols before/after anchor. 162 + return ( 163 + `_tmp_cols = [c for c in _df.columns if c not in ${JSON.stringify(cols)}]\n` + 164 + `_anchor_idx = _tmp_cols.index('${anchor}')\n` + 165 + `_insert_at = _anchor_idx + ${position === 'after' ? 1 : 0}\n` + 166 + `_df = _df[_tmp_cols[:_insert_at] + ${JSON.stringify(cols)} + _tmp_cols[_insert_at:]]\n` 167 + ); 127 168 }; 128 169 129 170 pythonGenerator.forBlock['tidyblocks_transform_display'] = _block => {
+20 -7
packages/tidyblocks/src/toolbox.ts
··· 1 1 /** 2 2 * Blockly toolbox definition for all jupyter-tidyblocks tidy-data blocks. 3 3 * Color palette matches the original tidyblocks project by Greg Wilson. 4 + * 5 + * Block naming follows dplyr (tidyverse) conventions where an equivalent 6 + * dplyr verb exists. 4 7 */ 5 8 export const TIDYBLOCKS_TOOLBOX = { 6 9 kind: 'categoryToolbox', ··· 38 41 { kind: 'block', type: 'tidyblocks_transform_filter' }, 39 42 { kind: 'block', type: 'tidyblocks_transform_select' }, 40 43 { kind: 'block', type: 'tidyblocks_transform_drop' }, 41 - { kind: 'block', type: 'tidyblocks_transform_create' }, 44 + { kind: 'block', type: 'tidyblocks_transform_mutate' }, // was: create 42 45 { kind: 'block', type: 'tidyblocks_transform_rename' }, 43 - { kind: 'block', type: 'tidyblocks_transform_sort' }, 44 - { kind: 'block', type: 'tidyblocks_transform_unique' }, 46 + { kind: 'block', type: 'tidyblocks_transform_relocate' }, // new: dplyr relocate() 47 + { kind: 'block', type: 'tidyblocks_transform_arrange' }, // was: sort 48 + { kind: 'block', type: 'tidyblocks_transform_distinct' }, // was: unique 45 49 { kind: 'block', type: 'tidyblocks_transform_groupby' }, 46 50 { kind: 'block', type: 'tidyblocks_transform_ungroup' }, 47 51 { kind: 'block', type: 'tidyblocks_transform_summarize' }, 52 + { kind: 'block', type: 'tidyblocks_transform_count' }, // new: dplyr count() 48 53 { kind: 'block', type: 'tidyblocks_transform_running' }, 49 54 { kind: 'block', type: 'tidyblocks_transform_bin' }, 50 55 { kind: 'block', type: 'tidyblocks_transform_saveas' }, 51 56 { kind: 'block', type: 'tidyblocks_transform_fillna' }, 52 57 { kind: 'block', type: 'tidyblocks_transform_dropna' }, 53 - { kind: 'block', type: 'tidyblocks_transform_sample' }, 54 - { kind: 'block', type: 'tidyblocks_transform_head' }, 55 - { kind: 'block', type: 'tidyblocks_transform_tail' }, 58 + { kind: 'block', type: 'tidyblocks_transform_slice_sample' }, // was: sample 59 + { kind: 'block', type: 'tidyblocks_transform_slice_head' }, // was: head 60 + { kind: 'block', type: 'tidyblocks_transform_slice_tail' }, // was: tail 61 + { kind: 'block', type: 'tidyblocks_transform_slice_min' }, // new: dplyr slice_min() 62 + { kind: 'block', type: 'tidyblocks_transform_slice_max' }, // new: dplyr slice_max() 56 63 { kind: 'block', type: 'tidyblocks_transform_display' } 57 64 ] 58 65 }, ··· 62 69 colour: '#808080', 63 70 contents: [ 64 71 { kind: 'block', type: 'tidyblocks_combine_join' }, 65 - { kind: 'block', type: 'tidyblocks_combine_glue' }, 72 + { kind: 'block', type: 'tidyblocks_combine_semi_join' }, // new: dplyr semi_join() 73 + { kind: 'block', type: 'tidyblocks_combine_anti_join' }, // new: dplyr anti_join() 74 + { kind: 'block', type: 'tidyblocks_combine_bind_rows' }, // was: glue 75 + { kind: 'block', type: 'tidyblocks_combine_bind_cols' }, // new: dplyr bind_cols() 66 76 { kind: 'block', type: 'tidyblocks_combine_cross_join' } 67 77 ] 68 78 }, ··· 117 127 contents: [ 118 128 { kind: 'block', type: 'tidyblocks_op_arithmetic' }, 119 129 { kind: 'block', type: 'tidyblocks_op_compare' }, 130 + { kind: 'block', type: 'tidyblocks_op_between' }, // new: dplyr between() 120 131 { kind: 'block', type: 'tidyblocks_op_logic' }, 121 132 { kind: 'block', type: 'tidyblocks_op_not' }, 122 133 { kind: 'block', type: 'tidyblocks_op_ifelse' }, 134 + { kind: 'block', type: 'tidyblocks_op_coalesce' }, // new: dplyr coalesce() 135 + { kind: 'block', type: 'tidyblocks_op_n_distinct' }, // new: dplyr n_distinct() 123 136 { kind: 'block', type: 'tidyblocks_op_typecheck' }, 124 137 { kind: 'block', type: 'tidyblocks_op_convert' }, 125 138 { kind: 'block', type: 'tidyblocks_op_datetime' },