Block Reference#

Complete reference for all blocks available in the Tidy Data toolbox. Blocks are organized by category, matching the sidebar in the Blockly editor.

Block names follow dplyr (tidyverse) conventions where an equivalent verb exists. The dplyr equivalent column shows the R function that inspired the block; a dash (—) means there is no direct dplyr analogue.

Data `#FEBE4C`#

Source blocks start a pipeline. They create a DataFrame stored in _df and have no top connector (nothing chains into them).

Block label	Block type	dplyr equivalent	What it does	Python generated
penguins dataset	`tidyblocks_data_penguins`	—	Palmer Penguins dataset loaded via seaborn	`_df = sns.load_dataset('penguins')`
iris dataset	`tidyblocks_data_iris`	—	Fisher iris dataset: sepal/petal measurements for 3 species (via seaborn)	`_df = sns.load_dataset('iris')`
titanic dataset	`tidyblocks_data_titanic`	—	Titanic passenger survival dataset (via seaborn)	`_df = sns.load_dataset('titanic')`
gapminder dataset	`tidyblocks_data_gapminder`	—	Gapminder life expectancy / GDP dataset across countries and years (via plotly.express)	`_df = px.data.gapminder()`
colors dataset	`tidyblocks_data_colors`	—	Built-in table of 11 colors with RGB values	`_df = pd.DataFrame({...})`
earthquakes dataset	`tidyblocks_data_earthquakes`	—	2016 global earthquake data from gvwilson/tidyblocks	`_df = pd.read_csv('<url>')`
sequence 1 to N as col	`tidyblocks_data_sequence`	—	Integer sequence 1..N in a named column	`_df = pd.DataFrame({'col': range(1, N+1)})`
dataset named name	`tidyblocks_data_user`	—	Reference a DataFrame previously saved with save as	`_df = name.copy()`
read CSV path	`tidyblocks_data_csv`	—	Load a CSV file from a local or remote path	`_df = pd.read_csv('path')`

Transform `#76AADB`#

Transform blocks read from and write back to _df. They can be chained in any order between a source block and a terminal block.

Row operations#

Block label	Block type	dplyr equivalent	What it does	Python generated
filter where cond	`tidyblocks_transform_filter`	`filter()`	Keep only rows where the condition is `True`	`_df = _df[cond]`
arrange by cols ↑↓	`tidyblocks_transform_arrange`	`arrange()`	Sort rows by one or more columns, ascending or descending	`_df = _df.sort_values(by=[...], ascending=True/False)`
distinct by cols	`tidyblocks_transform_distinct`	`distinct()`	Remove duplicate rows, keeping one per unique combination of columns	`_df = _df.drop_duplicates(subset=[...])`
slice_head N rows	`tidyblocks_transform_slice_head`	`slice_head()`	Keep the first N rows	`_df = _df.head(N)`
slice_tail N rows	`tidyblocks_transform_slice_tail`	`slice_tail()`	Keep the last N rows	`_df = _df.tail(N)`
slice_sample N rows	`tidyblocks_transform_slice_sample`	`slice_sample()`	Randomly sample N rows	`_df = _df.sample(n=N)`
slice_min N rows by col	`tidyblocks_transform_slice_min`	`slice_min()`	Keep the N rows with the smallest values in a column	`_df = _df.nsmallest(N, 'col')`
slice_max N rows by col	`tidyblocks_transform_slice_max`	`slice_max()`	Keep the N rows with the largest values in a column	`_df = _df.nlargest(N, 'col')`
drop rows with missing in cols	`tidyblocks_transform_dropna`	— (`tidyr::drop_na`)	Remove rows that have missing values in the specified columns	`_df = _df.dropna(subset=[...])`

Column operations#

Block label	Block type	dplyr equivalent	What it does	Python generated
select columns cols	`tidyblocks_transform_select`	`select()`	Keep only the named columns	`_df = _df[[...]]`
drop columns cols	`tidyblocks_transform_drop`	`select(-col)`	Remove the named columns	`_df = _df.drop(columns=[...])`
mutate col = expr	`tidyblocks_transform_mutate`	`mutate()`	Add a new column or overwrite an existing one with an expression	`_df = _df.assign(**{'col': expr})`
rename old to new	`tidyblocks_transform_rename`	`rename()`	Rename a single column	`_df = _df.rename(columns={'old': 'new'})`
relocate cols before/after anchor	`tidyblocks_transform_relocate`	`relocate()`	Move one or more columns to a new position relative to an anchor column	reorders `_df.columns`
fill missing in col with val	`tidyblocks_transform_fillna`	— (`tidyr::replace_na`)	Replace missing values in a column with a given value	`_df = _df.assign(**{'col': _df['col'].fillna(val)})`

Grouping & aggregation#

Block label	Block type	dplyr equivalent	What it does	Python generated
group by cols	`tidyblocks_transform_groupby`	`group_by()`	Group rows by the values in one or more columns for use with summarize or running	`_df = _df.groupby([...], as_index=False)`
ungroup	`tidyblocks_transform_ungroup`	`ungroup()`	Remove grouping and reset the row index	`_df = _df.reset_index(drop=True)`
summarize fn of col as result	`tidyblocks_transform_summarize`	`summarize()`	Aggregate each group (or the whole DataFrame) to a single row using count / sum / mean / median / min / max / std / var / n distinct / any / all	`_df = _df.agg(**{'result': ('col', 'fn')}).reset_index()`
count by cols	`tidyblocks_transform_count`	`count()`	Count rows for each unique combination of the specified columns	`_df = _df.groupby([...], as_index=False).size().rename(columns={'size': 'n'})`
running fn of col as result	`tidyblocks_transform_running`	— (window fns)	Compute a cumulative operation (cumsum / cummax / cummin / cummean / row index) across rows	`_df = _df.assign(**{'result': _df['col'].cumsum()})` etc.

Utilities#

Block label	Block type	dplyr equivalent	What it does	Python generated
bin col into N buckets as result	`tidyblocks_transform_bin`	— (`cut()`)	Discretize a numeric column into N equal-width interval buckets	`_df = _df.assign(**{'result': pd.cut(_df['col'], bins=N).astype(str)})`
save as name	`tidyblocks_transform_saveas`	—	Copy the current DataFrame into a named Python variable for later use with dataset named	`name = _df.copy()`
display table	`tidyblocks_transform_display`	—	Render the current DataFrame as an HTML table in the output cell	`display(_df)`

Combine `#808080`#

Combine blocks merge the current _df with a second DataFrame that was previously saved using save as.

Mutating joins (add columns from the right table)#

Block label	Block type	dplyr equivalent	What it does	Python generated
inner/left/right/outer join other on left col = right col	`tidyblocks_combine_join`	`inner_join()` / `left_join()` / `right_join()` / `full_join()`	Join two DataFrames on matching key columns. Choose inner (only matching rows), left (all left rows), right (all right rows), or outer (all rows from both)	`_df = pd.merge(_df, other, left_on='lk', right_on='rk', how='...')`
cross join with other	`tidyblocks_combine_cross_join`	`cross_join()`	Cartesian product — every row in `_df` paired with every row in `other`	`_df = _df.merge(other, how='cross')`

Filtering joins (keep/remove rows based on a match, no new columns)#

Block label	Block type	dplyr equivalent	What it does	Python generated
semi join other on left col = right col	`tidyblocks_combine_semi_join`	`semi_join()`	Keep only rows in `_df` that have a matching key in `other`. No columns from `other` are added.	`_df = _df[_df['lk'].isin(other['rk'])]`
anti join other on left col = right col	`tidyblocks_combine_anti_join`	`anti_join()`	Keep only rows in `_df` that have no matching key in `other`	`_df = _df[~_df['lk'].isin(other['rk'])]`

Binding (stack or glue tables together)#

Block label	Block type	dplyr equivalent	What it does	Python generated
bind_rows with other label column src	`tidyblocks_combine_bind_rows`	`bind_rows()`	Vertically stack `_df` on top of `other`, adding a label column to identify the source of each row	`_df = pd.concat([_df.assign(src='left'), other.assign(src='right')]).reset_index(drop=True)`
bind_cols with other	`tidyblocks_combine_bind_cols`	`bind_cols()`	Horizontally bind `_df` and `other` by column position. Both tables must have the same number of rows.	`_df = pd.concat([_df, other], axis=1)`

Plot `#A4C588`#

Plot blocks are terminal — they render a chart and have no bottom connector. All plots use Plotly Express.

Block label	Block type	dplyr equivalent	What it does	Python generated
bar chart x col y col	`tidyblocks_plot_bar`	—	Vertical bar chart	`px.bar(_df, x='col', y='col')`
box plot x col y col	`tidyblocks_plot_box`	—	Box-and-whisker plot showing median, IQR, and outliers	`px.box(_df, x='col', y='col')`
dot plot x col	`tidyblocks_plot_dot`	—	Strip/dot plot — one point per row along an axis	`px.strip(_df, x='col')`
histogram of col bins N	`tidyblocks_plot_histogram`	—	Frequency histogram with N bins	`px.histogram(_df, x='col', nbins=N)`
scatter plot x col y col color col trendline ☐	`tidyblocks_plot_scatter`	—	Scatter plot with optional color grouping and OLS trendline	`px.scatter(_df, x=..., y=..., color=..., trendline=...)`
line chart x col y col color col	`tidyblocks_plot_line`	—	Line chart with optional color grouping	`px.line(_df, x=..., y=..., color=...)`
violin plot x col y col	`tidyblocks_plot_violin`	—	Violin plot showing the distribution shape	`px.violin(_df, x='col', y='col')`
correlation heatmap	`tidyblocks_plot_heatmap`	—	Heatmap of pairwise Pearson correlations between all numeric columns	`px.imshow(_df.corr())`

Stats `#BA93DB`#

Stats blocks are terminal — they print results and have no bottom connector. All stats use scipy.stats and scikit-learn.

Block label	Block type	dplyr equivalent	What it does	Python generated
one-sample t-test column col vs mean μ	`tidyblocks_stats_ttest_one`	—	Two-sided one-sample t-test: tests whether the column mean equals μ. Prints t-statistic and p-value.	`stats.ttest_1samp(_df['col'], μ)`
two-sample t-test groups in group_col values in val_col	`tidyblocks_stats_ttest_two`	—	Two-sided two-sample t-test: splits rows into two groups and tests whether their means differ	`stats.ttest_ind(group_a, group_b)`
k-means x col y col k N label col	`tidyblocks_stats_kmeans`	—	K-means clustering on two columns; adds a cluster label column to `_df`	`KMeans(n_clusters=N).fit_predict(...)`
silhouette score x col y col labels col score col	`tidyblocks_stats_silhouette`	—	Computes the silhouette coefficient for existing cluster labels; adds a score column	`silhouette_score(X, labels)`
Pearson/Spearman/Kendall correlation of col_a and col_b	`tidyblocks_stats_correlation`	—	Computes pairwise correlation coefficient and p-value between two columns	`stats.pearsonr / spearmanr / kendalltau`
describe	`tidyblocks_stats_describe`	—	Prints `DataFrame.describe()` — count, mean, std, min, quartiles, max for every numeric column	`display(_df.describe())`

Values `#E7553C`#

Value blocks are expression blocks — they produce a value and connect into input slots on transform or operation blocks. They do not have statement connectors.

Block label	Block type	dplyr equivalent	What it does	Python generated
column col	`tidyblocks_value_column`	—	Reference a DataFrame column by name	`_df['col']`
number	`tidyblocks_value_number`	—	A numeric literal	`0`, `3.14`, etc.
"text"	`tidyblocks_value_text`	—	A string literal	`'text'`
true / false	`tidyblocks_value_logical`	—	A boolean literal	`True` / `False`
date YYYY-MM-DD	`tidyblocks_value_datetime`	—	A date/time constant	`pd.Timestamp('YYYY-MM-DD')`
missing	`tidyblocks_value_missing`	`NA`	An explicit missing (NA/NaN) value	`float("nan")`
Normal(mean μ std σ)	`tidyblocks_value_normal`	—	Draw a column of values from a Normal distribution	`np.random.normal(μ, σ, len(_df))`
Uniform(low a high b)	`tidyblocks_value_uniform`	—	Draw a column of values from a Uniform distribution	`np.random.uniform(a, b, len(_df))`
Exponential(lambda λ)	`tidyblocks_value_exponential`	—	Draw a column of values from an Exponential distribution	`np.random.exponential(1/λ, len(_df))`

Operations `#F9B5B2`#

Operation blocks are expression blocks used inside filter, mutate, fill missing, and similar blocks. They take value inputs and return a computed value.

Numeric & comparison#

Block label	Block type	dplyr equivalent	What it does	Python generated
a + b, a - b, a × b, a ÷ b, a % b, a ** b	`tidyblocks_op_arithmetic`	—	Standard arithmetic on two values	`(a + b)`, `(a * b)`, etc.
a = b, a ≠ b, a < b, a ≤ b, a > b, a ≥ b	`tidyblocks_op_compare`	—	Element-wise comparison, returning a boolean column	`(a == b)`, `(a < b)`, etc.
x between left and right	`tidyblocks_op_between`	`between()`	Return `True` for values within the inclusive range `[left, right]`	`x.between(left, right)`
abs / round / floor / ceil / sqrt / log / exp ( val )	`tidyblocks_op_math`	—	Apply a standard math function to a column	`val.abs()`, `np.sqrt(val)`, etc.

Logic#

Block label	Block type	dplyr equivalent	What it does	Python generated
a AND b / a OR b	`tidyblocks_op_logic`	—	Element-wise logical AND/OR on two boolean columns	`(a & b)`, `(a \| b)`
NOT val	`tidyblocks_op_not`	—	Element-wise logical NOT	`~(val)`
if cond then x else y	`tidyblocks_op_ifelse`	`if_else()`	Return `x` where `cond` is `True`, `y` elsewhere	`np.where(cond, x, y)`
coalesce val with replacement	`tidyblocks_op_coalesce`	`coalesce()`	Replace missing values in `val` with values from `replacement`	`val.fillna(replacement)`

Type operations#

Block label	Block type	dplyr equivalent	What it does	Python generated
val is missing / is number / is text / is date / is boolean	`tidyblocks_op_typecheck`	—	Test whether each element matches a specific type	`val.isna()`, `val.apply(isinstance(...))`, etc.
convert val to number / text / bool / datetime	`tidyblocks_op_convert`	—	Cast a column to a different type	`pd.to_numeric(val)`, `val.astype(str)`, etc.

Date & time#

Block label	Block type	dplyr equivalent	What it does	Python generated
year / month / day / weekday / hour / minute / second of val	`tidyblocks_op_datetime`	—	Extract a calendar component from a datetime column	`val.dt.year`, `val.dt.month`, etc.

Window & ranking#

Block label	Block type	dplyr equivalent	What it does	Python generated
shift val by N	`tidyblocks_op_shift`	`lag()` / `lead()`	Shift values forward (positive N = lag) or backward (negative N = lead)	`val.shift(N)`
n_distinct val	`tidyblocks_op_n_distinct`	`n_distinct()`	Count the number of distinct (unique) values in a column	`val.nunique()`

String#

Block label	Block type	dplyr equivalent	What it does	Python generated
val . upper / lower / strip / length	`tidyblocks_op_string`	—	Apply a string operation to a text column	`val.str.upper()`, `val.str.len()`, etc.
val contains pattern	`tidyblocks_op_str_contains`	`stringr::str_detect()`	Return `True` where the string column matches a pattern	`val.str.contains('pattern', na=False)`

Pipeline rules#

Block role	Has top connector	Has bottom connector	Examples
Source	No	Yes	all Data blocks
Transform	Yes	Yes	filter, mutate, arrange, …
Terminal	Yes	No	display table, all Plot blocks, all Stats blocks

A valid pipeline must be: one source → zero or more transforms → one terminal.

Configure Feed