···11+# duckdb.yazi
22+33+**Uses [duckdb](https://github.com/duckdb/duckdb) to quickly preview and summarize data files in [yazi](https://github.com/sxyazi/yazi)!**
44+55+<br>
66+77+<https://github.com/user-attachments/assets/ff2b11fb-d6fa-4b6a-b1a9-8aceed520189>
88+99+<br><br>
1010+1111+## What does it do?
1212+1313+This plugin previews your data files in yazi using DuckDB, with two available view modes:
1414+1515+- Preview csv, tsv, json, or parquet files in the following modes
1616+ - Standard mode (default): Displays the file as a table
1717+ - Summarized mode: Uses DuckDB's summarize function, enhanced with custom formatting for readability
1818+- Preview duckdb databases
1919+ - See the tables and the number of rows, columns, indexes in each. Plus a list of column names in index order.
2020+- Scroll rows using `J` and `K`
2121+- Scroll columns using your chosen keys ( I use `H` and `L` )
2222+- Change modes by pressing K when at the top of a file
2323+2424+Supported file types:
2525+2626+- .csv
2727+- .tsv
2828+- .txt - if tabular data
2929+- .json
3030+- .parquet
3131+- .xlsx
3232+- .duckdb
3333+- .db - if file is a duckdb database
3434+3535+<br><br>
3636+3737+## Features
3838+3939+### Column Scrolling
4040+4141+<br>
4242+4343+<https://github.com/user-attachments/assets/b347a7e8-05ea-442d-a88e-e2447975b653>
4444+4545+<br>
4646+4747+- Now supports scrolling horizontally (by column).
4848+- Works in all views
4949+- In the database view you can even scroll through the list of column names.
5050+- Output highlighting should now work across any os (where duckdb supports it).
5151+5252+>Requires a small amount of extra configuration from previous versions. These are keymaps (I use `H` and `L`) and some other aditional customisation options.
5353+>
5454+>See the [Installation](https://github.com/wylie102/duckdb.yazi/tree/main?tab=readme-ov-file#installation) and [Configuration](https://github.com/wylie102/duckdb.yazi/tree/main?tab=readme-ov-file#configurationcustomisation) sections.
5555+5656+>**Cache changes - update 04/04/25** - If you want info on the latest (cache related changes) then see [here](https://github.com/wylie102/duckdb.yazi?tab=readme-ov-file#setup-and-usage-changes-from-previous-versions). Otherwise keep reading new features and config options below.
5757+<br>
5858+5959+<br>
6060+6161+### Output Syntax Highlighting
6262+6363+- Passes through the colors from the duckdb output as you would see if using directly in the terminal.
6464+- These colors can be configured in your `~/.duckdbrc` file, see the Configuration section for details.
6565+6666+<br>
6767+6868+**Syntax highlighting with duckdb's default color scheme.**
6969+<img width="700" alt="Screenshot 2025-04-02 at 14 53 38" src="https://github.com/user-attachments/assets/d2267298-b91b-496c-ae74-1d432b826f6f" />
7070+7171+<br>
7272+7373+**Syntax highlighting with customized color scheme.**
7474+<img width="700" alt="Screenshot 2025-04-02 at 14 44 08" src="https://github.com/user-attachments/assets/965a0a4e-e4ed-4d88-ab95-84cd543f2a58" />
7575+7676+<br>
7777+7878+### Preview DuckDB Databases
7979+8080+- If you open a `.db` or `.duckdb` file directly, the plugin lists all tables in the database.
8181+- Each entry includes:
8282+ - Table name
8383+ - Rows Count
8484+ - Column count
8585+ - Primary key presence
8686+ - Index count
8787+ - All column names (aggregated and in index order)
8888+- Tables are **alphabetically ordered** and paginated for smooth scrolling.
8989+- Reads directly from the db in read only mode for file safety.
9090+9191+<br>
9292+9393+<img width="700" alt="Screenshot 2025-04-02 at 14 46 19" src="https://github.com/user-attachments/assets/c640d6f3-d9f6-4d98-acd8-9e4c87c6e728" />
9494+9595+<br>
9696+9797+### More customisation options - row_id (row number) and width of the min/max columns
9898+9999+- Row id - in standard view to help keep track when scrolling, Default is off, but can be turned on in `init.lua` options.
100100+- Width of min and max columns. Default is now 21 twice as wide as previously. Is now customisable in the `init.lua`, the unit is the number of characters shown.
101101+102102+<br>
103103+104104+<img width="700" alt="Screenshot 2025-04-02 at 14 49 26" src="https://github.com/user-attachments/assets/6c8fb1ae-3de8-41ce-9c90-0279dc3b5e61" />
105105+106106+<br><br>
107107+108108+### Preview mode is now toggleable
109109+110110+- Preview mode can be toggled within yazi
111111+- Press "K" at the top of the file to toggle between "standard" and "summarized."
112112+- The mode enabled at startup is customisable in the `init.lua` see Configuration section.
113113+114114+### Performance improvements through caching
115115+116116+- "Standard" and "summarized" views are cached upon first load, improving scrolling performance
117117+118118+- Note that on entering a directory you haven't entered before (or one containing files that have been changed) cacheing is triggered. Until cache's are generated, summarized mode may take a longer to show as it will be run on the original file, and scrolling other files during this time (especially large ones) can slow things even further as new queries on the file will be competing with cache queries. Instead it is worth waiting until the caches load (displayed in bottom right corner) or switching to standard view during these first few seconds. This will be most apparent on large, non-parquet files
119119+120120+<br><br>
121121+122122+## Installation
123123+124124+### Installing dependancies
125125+126126+First you will need Yazi and DuckDB installed.
127127+128128+- [Yazi Installation instructions](https://yazi-rs.github.io/docs/installation)
129129+130130+- [DuckDB Installation instructions](https://duckdb.org/docs/installation/?version=stable&environment=cli&platform=macos&download_method=direct)
131131+132132+Once these are installed you can use the yazi plugin manager to install the plugin.
133133+134134+Use the command:
135135+136136+```
137137+ya pack -a wylie102/duckdb
138138+```
139139+140140+in your terminal
141141+142142+<br>
143143+144144+### yazi.toml
145145+146146+Then navigate to your [yazi.toml](https://yazi-rs.github.io/docs/configuration/yazi#manager.ratio) file this should be the `yazi` folder in your `config` directory
147147+148148+and add:
149149+150150+```toml
151151+[plugin]
152152+prepend_previewers = [
153153+ { name = "*.csv", run = "duckdb" },
154154+ { name = "*.tsv", run = "duckdb" },
155155+ { name = "*.json", run = "duckdb" },
156156+ { name = "*.parquet", run = "duckdb" },
157157+ { name = "*.txt", run = "duckdb" },
158158+ { name = "*.xlsx", run = "duckdb" },
159159+ { name = "*.db", run = "duckdb" },
160160+ { name = "*.duckdb", run = "duckdb" }
161161+]
162162+163163+prepend_preloaders = [
164164+ { name = "*.csv", run = "duckdb", multi = false },
165165+ { name = "*.tsv", run = "duckdb", multi = false },
166166+ { name = "*.json", run = "duckdb", multi = false },
167167+ { name = "*.parquet", run = "duckdb", multi = false },
168168+ { name = "*.txt", run = "duckdb", multi = false },
169169+ { name = "*.xlsx", run = "duckdb", multi = false }
170170+]
171171+```
172172+173173+>note on .txt: I have tried to exclude files that contain only raw text (if duckdb reads only one column). However, if you don't ever work with .txt files which contain tabular data (basically misnamed csv or tsv files) then you can just not include the .txt lines in your setup.
174174+175175+<br>
176176+177177+>note on .xlsx: This can be temperamental, especially around inferring types. This is due to the way that duckdb handles excel files. This feature currently uses st_read from the spatial extension since it gives the most consistent type results. Hopefully they will soon implement some of the smart type detection from the csv reader in their excel extension and then we can use that instead.
178178+179179+<br>
180180+181181+### init.lua
182182+183183+Then create an `init.lua` file in the same folder and add
184184+185185+```lua
186186+-- DuckDB plugin configuration
187187+require("duckdb"):setup()
188188+```
189189+190190+This is where the configuration/settings can go ([see below](https://github.com/wylie102/duckdb.yazi?tab=readme-ov-file#configurationcustomisation)), but the init.lua file and this line are required for the plugin to run, even if the settings are blank. Another option is to add all of the settings with the defaults in so that it's easy to change at a later date.
191191+192192+<br>
193193+194194+### keymap.toml
195195+196196+Then in your [keymap.toml](https://yazi-rs.github.io/docs/configuration/keymap) file add:
197197+198198+```toml
199199+[[manager.prepend_keymap]]
200200+on = "H"
201201+run = "plugin duckdb -1"
202202+desc = "Scroll one column to the left"
203203+204204+[[manager.prepend_keymap]]
205205+on = "L"
206206+run = "plugin duckdb +1"
207207+desc = "Scroll one column to the right"
208208+209209+[[manager.prepend_keymap]]
210210+on = ["g", "o"]
211211+run = "plugin duckdb -open"
212212+desc = "open with duckdb"
213213+214214+[[manager.prepend_keymap]]
215215+on = ["g", "u"]
216216+run = "plugin duckdb -ui"
217217+desc = "open with duckdb ui"
218218+219219+```
220220+221221+>I use `H` and `L` because it makes logical sense to me.
222222+>
223223+>But these overwrite:
224224+>
225225+>- `H` - previous directory and
226226+>- `L` - next directory
227227+>(different from standard `h` and `l` for patent and child directory).
228228+>
229229+>So if you use those you might want to choose something else, or remap those to <C-h> and <C-l> instead.
230230+231231+<br>
232232+233233+### Aditional setup and recommended plugins for more preview space
234234+235235+Use with a larger preview window - add to your `yazi.toml`
236236+237237+```toml
238238+[manager]
239239+ratio = [1, 2, 5]
240240+```
241241+242242+For reference the default ratio is 1, 4, 3
243243+244244+Use:
245245+246246+[maximize the preview pane plugin](https://github.com/yazi-rs/plugins/tree/main/toggle-pane.yazi)
247247+248248+<br><br>
249249+250250+## Configuration/Customisation
251251+252252+Configuration of yazi.duckdb is done via the `init.lua` file in `config/yazi` (where your plugin folder and yazi.toml file live).
253253+If you don't have one you can just create one.
254254+Add the following:
255255+256256+```lua
257257+ -- DuckDB plugin configuration
258258+require("duckdb"):setup({
259259+ mode = "standard"/"summarized", -- Default: "summarized"
260260+ cache_size = 1000 -- Default: 500
261261+ row_id = true/false/"dynamic", -- Default: false
262262+ minmax_column_width = int -- Default: 21
263263+ column_fit_factor = float -- Default: 10.0
264264+})
265265+```
266266+267267+If you don't include a setting, it will revert to the default.
268268+269269+But the setup call `require("duckdb"):setup()` is still required for the plugin to intialize correctly.
270270+271271+<br>
272272+273273+### Explaination of settings
274274+275275+- mode - the view that will be the default on startup. The default is summarized, but this can sometimes be slow if running while the files are also being cached. Most of the time it will be the same speed as standard, so pick the one you like.
276276+277277+- cache_size - the number of rows cached in the standard mode. Make the number higher if you want to be able to scroll further down in your files. Be aware this could impact cache size and cache performance if it was made too large. If you change this setting you will need to run `yazi --clear-cache` for it to take effect.
278278+279279+- row_id - displays a row column when viewing in standard mode. If set to dynamic it will only turn on when scrolling columns and will always be the left most column.
280280+281281+- minmax_column_width - is the number of characters displayed in the min and max columns in summarized view. Default is 21, which is roughly enough to see date and time in a datetime column. If you need more set it higher, if you want mim/max to take up less space set it lower.
282282+283283+- column_fit_factor - this one is actually important but might feel a bit counter-intuitive so have a look below.
284284+ - TLDR: duckdb.yazi is designed to overspill the screen on the right side. Unless all your columns are incredibly narrow/you can see the right border of your table when there are still more columns to scroll OR you work with tables with a very large number of columns and scrolling them feels slightly show, you can probably leave it alone.
285285+ - Slightly longer instructions: To fully optimise this, 1. Lower it until your columns no longer spill off the end of the screen (check this on a few files) Step 2 - Increase by 1 so that columns again spill over the right border.
286286+ - More detailed explaination: Implementing column scrolling also gave us a mechanism to user-attachments only the columns we need to fill (in reality slightly overfill) the screen. The reason for this is that if the table is incredibly wide (has a high number of columns) it would slow down the query. But while the plugin can detect how wide the display area is, it doesn't know how wide your collumns are. So this number represents the average amount of space (in characters) duckdb.yazi expects each column to take up when deciding how many columns to request. columns_displayed = display_area_width / column_fit_factor. So larger number = fewer columns, smaller number = more columns. Ideally you want the columns to **just** spill over the right border of the screen which will give the feeling of movement when scrolling. The default - 10.0 - should accommodate most column sizes while giving good performance. Setting to 7.73 should display even the narrowest columns correctly, but may cause queries to be slightly slower when working with very large numbers of columns.
287287+288288+### Configuring duckdb
289289+290290+Configuration of DuckDB can be done in the `~/.duckdbrc` file.
291291+This should be placed in your home directory ([duckdb docs](https://duckdb.org/docs/stable/operations_manual/footprint_of_duckdb/files_created_by_duckdb)).
292292+293293+You can customise the colors of the preview using the following options
294294+295295+```
296296+.highlight_colors layout gray
297297+.highlight_colors column_name magenta bold
298298+.highlight_colors column_type gray
299299+.highlight_colors string_value cyan
300300+.highlight_colors numeric_value green
301301+.highlight_colors temporal_value blue
302302+.highlight_colors footer gray
303303+```
304304+305305+The above configuration is what is used in the video at the top of the readme and in the screenshots of the color highlithing section.
306306+Although the actual colours will depend on your terminal/yazi color scheme.
307307+These should be placed in your `~./duckdbrc` file as is.
308308+No header is needed, they are simply commands run on the startup of any duckdb instance (when using the CLI).
309309+These will change the color of the output in both duckdb.yazi and when using it in the CLI.
310310+311311+Color options are:
312312+red|green|yellow|blue|magenta|cyan|white
313313+314314+You can also specify bold, underline or bold_underline after the colors
315315+e.g. `.highlight_colors column_type red bold_underline`
316316+317317+If the file is empty or doesn't exist then the default duckdb color scheme will be used
318318+This uses gray for borders and NULLs and looks like this
319319+320320+<img width="700" alt="Screenshot 2025-04-02 at 14 53 38" src="https://github.com/user-attachments/assets/d2267298-b91b-496c-ae74-1d432b826f6f" />
321321+322322+You can also turn the highlighting off by adding `.highlight_results off`
323323+In which case it will look like below.
324324+325325+<img width="700" alt="Screenshot 2025-03-22 at 18 00 06" src="https://github.com/user-attachments/assets/db09fff9-2db1-4273-9ddf-34d0bf087967" />
326326+327327+More information [here](https://duckdb.org/docs/stable/clients/cli/dot_commands#configuring-the-result-syntax-highlighter)
328328+329329+<br><br>
330330+331331+## Setup and usage changes from previous versions
332332+333333+### A Note on the Latest update
334334+335335+Added logic for reading `.xlsx` and `.txt` files, you can just add these to your yazi.toml file to be able to view them.
336336+Also added the ability to set the cache row size in the yazi.toml file.
+806
yazi/.config/yazi/plugins/duckdb.yazi/main.lua
···11+--- @since 25.4.8
22+-- DuckDB Plugin for Yazi
33+local M = {}
44+55+local update_state = ya.sync(function(state, action, category, key, value)
66+ -- Ensure the subtable for the category exists.
77+ state[category] = state[category] or {}
88+99+ if action == "set" then
1010+ state[category][key] = value
1111+ elseif action == "get" then
1212+ return state[category][key]
1313+ elseif action == "check" then
1414+ return state[category][key] ~= nil
1515+ else
1616+ ya.err("Unknown action: " .. tostring(action))
1717+ end
1818+end)
1919+2020+local function set_opts(key, value)
2121+ update_state("set", "opts", key, value)
2222+end
2323+2424+local function get_opts(key)
2525+ return update_state("get", "opts", key)
2626+end
2727+2828+local function add_to_list(category, cache_str)
2929+ update_state("set", category, cache_str, true)
3030+end
3131+3232+local function remove_from_list(category, cache_str)
3333+ update_state("set", category, cache_str, nil)
3434+end
3535+3636+local function is_on_list(category, cache_str)
3737+ return update_state("check", category, cache_str)
3838+end
3939+4040+local function clear_list(category)
4141+ set_opts(category, {}) -- replaces the whole list with an empty table
4242+end
4343+4444+local function add_queries_to_table(target_table, queries)
4545+ if type(queries) == "table" then
4646+ for _, item in ipairs(queries) do
4747+ table.insert(target_table, "-c")
4848+ table.insert(target_table, item)
4949+ end
5050+ else
5151+ table.insert(target_table, "-c")
5252+ table.insert(target_table, queries)
5353+ end
5454+end
5555+5656+local function generate_data_source_string(target, file_type)
5757+ local url_string = "'" .. tostring(target) .. "'"
5858+ if file_type == "excel" then
5959+ return string.format("st_read(%s)", url_string)
6060+ elseif file_type == "text" then
6161+ return string.format("read_csv(%s)", url_string)
6262+ else
6363+ return url_string
6464+ end
6565+end
6666+6767+local extension_map = {
6868+ csv = "csv",
6969+ tsv = "csv",
7070+ txt = "text",
7171+ json = "json",
7272+ parquet = "parquet",
7373+ xlsx = "excel",
7474+ duckdb = "duckdb",
7575+ db = "duckdb",
7676+}
7777+7878+local function get_extension(filename)
7979+ -- Match the last "dot + word characters" at the end of the string
8080+ return filename:match("^.+%.([a-zA-Z0-9]+)$")
8181+end
8282+8383+local function check_file_type(path)
8484+ local name = path.name or ""
8585+ local ext = get_extension(name)
8686+ if ext then
8787+ local filetype = extension_map[ext:lower()]
8888+ if filetype then
8989+ return filetype
9090+ end
9191+ end
9292+ ya.err("File is not a supported file type")
9393+end
9494+9595+local get_hovered_url_string = ya.sync(function()
9696+ return tostring(cx.active.current.hovered.url)
9797+end)
9898+9999+local duckdb_opener = ya.sync(function(_, arg)
100100+ local hovered_url = Url(get_hovered_url_string())
101101+ local file_type = check_file_type(hovered_url)
102102+ local command = "duckdb "
103103+ if file_type == "excel" then
104104+ command = string.format([[%s-cmd "install spatial;" -cmd "load spatial;" ]], command)
105105+ ya.dbg("command: " .. tostring(command))
106106+ end
107107+108108+ if file_type ~= "duckdb" then
109109+ local table_name = '\\"' .. hovered_url.stem .. '\\"'
110110+ local data_source_string = generate_data_source_string(hovered_url, file_type)
111111+ local query = string.format("CREATE TABLE %s AS FROM %s;", table_name, data_source_string)
112112+ command = string.format('%s-cmd "%s"', command, query)
113113+ ya.dbg("command final: " .. tostring(command))
114114+ else
115115+ command = command .. tostring(hovered_url)
116116+ end
117117+118118+ if arg ~= "-open" then
119119+ command = string.format("%s -ui", command)
120120+ end
121121+ ya.emit("shell", { command, block = true, orphan = true, confirm = true })
122122+end)
123123+124124+function M:entry(job)
125125+ local arg = job.args and job.args[1]
126126+ if arg ~= "+1" and arg ~= "-1" then
127127+ return duckdb_opener(arg)
128128+ end
129129+ local scroll_delta = tonumber(arg)
130130+131131+ if not scroll_delta then
132132+ ya.err("DuckDB column scroll entry: Invalid or missing scroll delta; exiting.")
133133+ return
134134+ end
135135+136136+ local scrolled_columns = get_opts("scrolled_columns") or 0
137137+ scrolled_columns = math.max(0, scrolled_columns + scroll_delta)
138138+ set_opts("scrolled_columns", scrolled_columns)
139139+140140+ ya.emit("seek", { "lateral scroll" })
141141+end
142142+143143+-- Setup from init.lua: require("duckdb"):setup({ mode = "standard"/"summarized" })
144144+function M:setup(opts)
145145+ opts = opts or {}
146146+147147+ local mode = opts.mode or "summarized"
148148+ local operating_system = ya.target_os()
149149+ local column_width = opts.minmax_column_width or 21
150150+ local row_id = opts.row_id
151151+ if row_id == nil then
152152+ row_id = false
153153+ end
154154+ local column_fit_factor = opts.column_fit_factor or 10
155155+ local limit = opts.cache_size or 500
156156+157157+ set_opts("mode", mode)
158158+ set_opts("mode_changed", false)
159159+ set_opts("re_peek", false)
160160+ set_opts("os", operating_system)
161161+ set_opts("column_width", column_width)
162162+ set_opts("row_id", row_id)
163163+ set_opts("scrolled_columns", 0)
164164+ set_opts("column_fit_factor", column_fit_factor)
165165+ set_opts("limit", limit)
166166+end
167167+168168+local function generate_preload_query(job, mode, file_type, limit)
169169+ local data_source_string = generate_data_source_string(job.file.url, file_type)
170170+ local limit_string = ""
171171+ if limit then
172172+ limit_string = "LIMIT " .. tostring(limit)
173173+ end
174174+ if mode == "standard" then
175175+ return "FROM " .. data_source_string .. limit_string
176176+ else
177177+ return string.format(
178178+ "SELECT * EXCLUDE(null_percentage), CAST(null_percentage AS DOUBLE) AS null_percentage FROM (SUMMARIZE FROM %s)",
179179+ data_source_string
180180+ )
181181+ end
182182+end
183183+184184+local function generate_summary_cte(target)
185185+ local column_width = get_opts("column_width")
186186+ return string.format(
187187+ [[
188188+SELECT
189189+ column_name AS column,
190190+ column_type AS type,
191191+ count,
192192+ approx_unique AS unique,
193193+ null_percentage AS "null%%",
194194+ LEFT(min, %d) AS min,
195195+ LEFT(max, %d) AS max,
196196+ CASE
197197+ WHEN avg IS NULL THEN NULL
198198+ WHEN TRY_CAST(avg AS DOUBLE) IS NULL THEN CAST(avg AS VARCHAR)
199199+ WHEN CAST(avg AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(avg AS DOUBLE), 2) AS VARCHAR)
200200+ WHEN CAST(avg AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(avg AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
201201+ WHEN CAST(avg AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(avg AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
202202+ WHEN CAST(avg AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(avg AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
203203+ ELSE '∞'
204204+ END AS avg,
205205+ CASE
206206+ WHEN std IS NULL THEN NULL
207207+ WHEN TRY_CAST(std AS DOUBLE) IS NULL THEN CAST(std AS VARCHAR)
208208+ WHEN CAST(std AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(std AS DOUBLE), 2) AS VARCHAR)
209209+ WHEN CAST(std AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(std AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
210210+ WHEN CAST(std AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(std AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
211211+ WHEN CAST(std AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(std AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
212212+ ELSE '∞'
213213+ END AS std,
214214+ CASE
215215+ WHEN q25 IS NULL THEN NULL
216216+ WHEN column_type = 'TIMESTAMP' THEN coalesce(strftime(try_strptime(q25::VARCHAR, '%%c.%%f'), '%%c'), q25::VARCHAR)
217217+ WHEN TRY_CAST(q25 AS DOUBLE) IS NULL THEN CAST(q25 AS VARCHAR)
218218+ WHEN CAST(q25 AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(q25 AS DOUBLE), 2) AS VARCHAR)
219219+ WHEN CAST(q25 AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(q25 AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
220220+ WHEN CAST(q25 AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(q25 AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
221221+ WHEN CAST(q25 AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(q25 AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
222222+ ELSE '∞'
223223+ END AS q25,
224224+ CASE
225225+ WHEN q50 IS NULL THEN NULL
226226+ WHEN column_type = 'TIMESTAMP' THEN coalesce(strftime(try_strptime(q50::VARCHAR, '%%c.%%f'), '%%c'), q50::VARCHAR)
227227+ WHEN TRY_CAST(q50 AS DOUBLE) IS NULL THEN CAST(q50 AS VARCHAR)
228228+ WHEN CAST(q50 AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(q50 AS DOUBLE), 2) AS VARCHAR)
229229+ WHEN CAST(q50 AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(q50 AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
230230+ WHEN CAST(q50 AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(q50 AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
231231+ WHEN CAST(q50 AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(q50 AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
232232+ ELSE '∞'
233233+ END AS q50,
234234+ CASE
235235+ WHEN q75 IS NULL THEN NULL
236236+ WHEN column_type = 'TIMESTAMP' THEN coalesce(strftime(try_strptime(q75::VARCHAR, '%%c.%%f'), '%%c'), q75::VARCHAR)
237237+ WHEN TRY_CAST(q75 AS DOUBLE) IS NULL THEN CAST(q75 AS VARCHAR)
238238+ WHEN CAST(q75 AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(q75 AS DOUBLE), 2) AS VARCHAR)
239239+ WHEN CAST(q75 AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(q75 AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
240240+ WHEN CAST(q75 AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(q75 AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
241241+ WHEN CAST(q75 AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(q75 AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
242242+ ELSE '∞'
243243+ END AS q75
244244+FROM %s
245245+ ]],
246246+ column_width,
247247+ column_width,
248248+ target
249249+ )
250250+end
251251+252252+-- Get preview cache path
253253+local function get_cache_path(job, mode, extension)
254254+ local suffix = "_" .. mode .. ".parquet"
255255+ if extension then
256256+ suffix = "_" .. extension .. "." .. extension
257257+ end
258258+ local cache_version = 3
259259+ local skip = job.skip
260260+ job.skip = 1000000 + cache_version
261261+ local base = ya.file_cache(job)
262262+ job.skip = skip
263263+264264+ if not base then
265265+ return nil, nil
266266+ end
267267+268268+ local base_str = tostring(base) .. suffix
269269+ local path_url = Url(base_str)
270270+ local path_str = tostring(path_url.name)
271271+ return path_str, path_url
272272+end
273273+274274+-- Run queries.
275275+local function run_query(job, query, target, file_type)
276276+ local width = math.max((job.area and job.area.w * 3 or 80), 80)
277277+ local height = math.max((job.area and job.area.h or 25), 25)
278278+279279+ local args = {}
280280+281281+ if file_type == "duckdb" then
282282+ table.insert(args, "-readonly")
283283+ table.insert(args, tostring(target))
284284+ elseif file_type == "excel" then
285285+ add_queries_to_table(args, { "install spatial", "load spatial" })
286286+ end
287287+288288+ -- Duckbox config
289289+ add_queries_to_table(args, {
290290+ ".mode duckbox",
291291+ ".timer off",
292292+ "SET enable_progress_bar = false;",
293293+ string.format(".maxwidth %d", width),
294294+ string.format(".maxrows %d", height),
295295+ ".highlight_results on",
296296+ })
297297+298298+ -- Add query or list of queries
299299+ add_queries_to_table(args, query)
300300+301301+ local child = Command("duckdb"):arg(args):stdout(Command.PIPED):stderr(Command.PIPED):spawn()
302302+ if not child then
303303+ ya.err("Failed to spawn DuckDB")
304304+ return nil
305305+ end
306306+307307+ local output, err = child:wait_with_output()
308308+ if err or not output.status.success then
309309+ ya.err("DuckDB error: " .. (err or output.stderr or "[unknown error]"))
310310+ return nil
311311+ end
312312+313313+ return output
314314+end
315315+316316+local function generate_db_query(limit, offset)
317317+ local scroll = get_opts("scrolled_columns") or 0
318318+319319+ local metadata_fields = { "rows", "columns", "has_pk", "indexes" }
320320+ local visible_column_count = 10
321321+ local max_scroll_metadata = #metadata_fields
322322+ local metadata_projection = { "table_name" }
323323+324324+ if scroll < max_scroll_metadata then
325325+ for i = scroll + 1, #metadata_fields do
326326+ table.insert(metadata_projection, metadata_fields[i])
327327+ end
328328+ table.insert(metadata_projection, "column_names") -- always show
329329+330330+ local projection = table.concat(metadata_projection, ", ")
331331+ return string.format(
332332+ [[
333333+WITH table_info AS (
334334+ SELECT
335335+ DISTINCT t.table_name,
336336+ t.estimated_size AS rows,
337337+ t.column_count AS columns,
338338+ t.has_primary_key AS has_pk,
339339+ t.index_count AS indexes,
340340+ STRING_AGG(c.column_name, ', ' ORDER BY c.column_index) OVER (PARTITION BY t.table_name) AS column_names
341341+ FROM duckdb_tables() t
342342+ LEFT JOIN duckdb_columns() c ON t.table_name = c.table_name
343343+)
344344+SELECT %s FROM table_info
345345+ORDER BY table_name
346346+LIMIT %d OFFSET %d;
347347+]],
348348+ projection,
349349+ limit,
350350+ offset
351351+ )
352352+ else
353353+ local column_scroll = scroll - max_scroll_metadata
354354+ local start_pos = column_scroll + 1
355355+ local end_pos = column_scroll + visible_column_count
356356+357357+ return string.format(
358358+ [[
359359+WITH raw AS (
360360+ SELECT
361361+ t.table_name,
362362+ c.column_name,
363363+ row_number() OVER (PARTITION BY t.table_name ORDER BY c.column_index) AS col_pos
364364+ FROM duckdb_tables() t
365365+ LEFT JOIN duckdb_columns() c ON t.table_name = c.table_name
366366+),
367367+scrolling AS (
368368+ SELECT
369369+ table_name,
370370+ column_name,
371371+ col_pos
372372+ FROM raw
373373+ WHERE col_pos >= %d AND col_pos < %d
374374+),
375375+aggregated AS (
376376+ SELECT
377377+ table_name,
378378+ STRING_AGG(column_name, ', ' ORDER BY col_pos) AS column_names
379379+ FROM scrolling
380380+ GROUP BY table_name
381381+)
382382+SELECT table_name, column_names FROM aggregated
383383+ORDER BY table_name
384384+LIMIT %d OFFSET %d;
385385+]],
386386+ start_pos,
387387+ end_pos,
388388+ limit,
389389+ offset
390390+ )
391391+ end
392392+end
393393+394394+local function generate_standard_query(target, job, limit, offset)
395395+ local scroll = get_opts("scrolled_columns") or 0
396396+ local actual_width = math.max((job.area and job.area.w or 80), 80)
397397+ local column_fit_factor = get_opts("column_fit_factor") or 7
398398+ local fetched_columns = math.floor(actual_width / column_fit_factor) + scroll
399399+ local row_id_mode = get_opts("row_id")
400400+401401+ -- Determine if row_id should be prepended
402402+ local row_id_prefix = ""
403403+ local row_id_enabled = (row_id_mode == true) or (row_id_mode == "dynamic" and scroll > 0)
404404+ if row_id_enabled then
405405+ row_id_prefix = "row_number() over () as row, "
406406+ end
407407+408408+ local included_columns_cte = string.format(
409409+ [[
410410+set variable included_columns = (
411411+ with column_list as (
412412+ select column_name, row_number() over () as row
413413+ from (describe select * from %s)
414414+ )
415415+ select list(column_name)
416416+ from column_list
417417+ where row > %d and row <= (%d)
418418+);
419419+]],
420420+ target,
421421+ scroll,
422422+ fetched_columns
423423+ )
424424+425425+ local filtered_select = string.format(
426426+ "select %scolumns(c -> list_contains(getvariable('included_columns'), c)) from %s limit %d offset %d;",
427427+ row_id_prefix,
428428+ target,
429429+ limit,
430430+ offset
431431+ )
432432+ return { included_columns_cte, filtered_select }
433433+end
434434+435435+local function generate_summarized_query(source, limit, offset)
436436+ local scroll = get_opts("scrolled_columns") or 0
437437+438438+ -- These are the scrollable fields, in display order
439439+ local fields = {
440440+ '"type"',
441441+ '"count"',
442442+ '"unique"',
443443+ '"null%"',
444444+ '"min"',
445445+ '"max"',
446446+ '"avg"',
447447+ '"std"',
448448+ '"q25"',
449449+ '"q50"',
450450+ '"q75"',
451451+ }
452452+453453+ -- Always include the column name
454454+ local selected_fields = { '"column"' }
455455+456456+ -- Add scrollable fields from scroll onwards
457457+ for i = scroll + 1, #fields do
458458+ table.insert(selected_fields, fields[i])
459459+ end
460460+461461+ local summary_cte = generate_summary_cte(source)
462462+ local projection = table.concat(selected_fields, ", ")
463463+464464+ return string.format(
465465+ [[
466466+WITH summary_cte AS (
467467+ %s
468468+)
469469+SELECT %s FROM summary_cte LIMIT %d OFFSET %d;
470470+]],
471471+ summary_cte,
472472+ projection,
473473+ limit,
474474+ offset
475475+ )
476476+end
477477+478478+local function generate_peek_query(target, job, limit, offset, file_type, cache_str)
479479+ local mode = get_opts("mode")
480480+ local is_original_file = (target == job.file.url)
481481+482482+ -- If the file itself is a DuckDB database, list tables/columns
483483+ if is_original_file and file_type == "duckdb" then
484484+ return generate_db_query(limit, offset)
485485+ end
486486+487487+ local target_type = is_original_file and file_type or "cache"
488488+ local source = generate_data_source_string(target, target_type)
489489+490490+ if mode == "standard" then
491491+ return generate_standard_query(source, job, limit, offset)
492492+ end
493493+ local placeholder = "⏱"
494494+ if is_on_list("bad_cache", cache_str) then
495495+ placeholder = "∅"
496496+ end
497497+498498+ if file_type ~= "parquet" then
499499+ local summary_source = is_original_file
500500+ and string.format(
501501+ [[(select
502502+ column_name,
503503+ column_type,
504504+ ' %s ' as count,
505505+ ' %s ' as "approx_unique",
506506+ ' %s ' as "null_percentage",
507507+ ' %s ' as min,
508508+ ' %s ' as max,
509509+ ' %s ' as avg,
510510+ ' %s ' as std,
511511+ ' %s ' as q25,
512512+ ' %s ' as q50,
513513+ ' %s ' as q75
514514+ from (describe select * from %s))]],
515515+ placeholder,
516516+ placeholder,
517517+ placeholder,
518518+ placeholder,
519519+ placeholder,
520520+ placeholder,
521521+ placeholder,
522522+ placeholder,
523523+ placeholder,
524524+ placeholder,
525525+ source
526526+ )
527527+ or source
528528+ return generate_summarized_query(summary_source, limit, offset)
529529+ else
530530+ local summary_source = is_original_file
531531+ and string.format(
532532+ [[
533533+ (select
534534+ d.column_name,
535535+ d.column_type,
536536+ sum(m.num_values) as count,
537537+ ' %s ' as "approx_unique",
538538+ ' %s ' as "null_percentage",
539539+ case when min(m.stats_min) is null then '%s' else min(m.stats_min) end as min,
540540+ case when min(m.stats_max) is null then '%s' else max(m.stats_max) end as max,
541541+ ' %s ' as "avg",
542542+ ' %s ' as "std",
543543+ ' %s ' as q25,
544544+ ' %s ' as q50,
545545+ ' %s ' as q75
546546+ from (describe select * from %s) d
547547+ left join parquet_metadata(%s) m
548548+ on d.column_name = m.path_in_schema
549549+ group by all
550550+ order by min(column_id))
551551+ ]],
552552+ placeholder,
553553+ placeholder,
554554+ placeholder,
555555+ placeholder,
556556+ placeholder,
557557+ placeholder,
558558+ placeholder,
559559+ placeholder,
560560+ placeholder,
561561+ source,
562562+ source
563563+ )
564564+ or source
565565+ return generate_summarized_query(summary_source, limit, offset)
566566+ end
567567+end
568568+569569+local function render_output(output, job)
570570+ local cleaned = output.stdout and output.stdout:gsub("\r", "") or "[no output]"
571571+ ya.preview_widget(job, {
572572+ ui.Text.parse(cleaned):area(job.area),
573573+ })
574574+end
575575+576576+local function output_is_valid(output, mode, job)
577577+ if output then
578578+ if output.stderr and output.stderr ~= "" then
579579+ ya.err("DuckDB returned an error or:\n" .. output.stderr)
580580+ return false
581581+ elseif not output.stdout or output.stdout == "" then
582582+ ya.err(string.format("Peek - No stdout/stderr from %s cache for %s", mode, job.file.url))
583583+ return false
584584+ else
585585+ return true
586586+ end
587587+ else
588588+ ya.err("Duckdb failed to return output")
589589+ return false
590590+ end
591591+end
592592+593593+local function prepare_peek_context(job)
594594+ local file_url = job.file.url
595595+ local re_peek = get_opts("re_peek")
596596+ local mode = get_opts("mode")
597597+ local mode_changed = get_opts("mode_changed")
598598+599599+ -- Handle scroll reset and peek triggering
600600+ if not re_peek then
601601+ local raw_skip = job.skip or 0
602602+ if raw_skip == 0 or mode_changed then
603603+ set_opts("scrolled_columns", 0)
604604+ end
605605+ if mode_changed then
606606+ set_opts("mode_changed", false)
607607+ end
608608+ job.skip = math.max(0, raw_skip - 50)
609609+ end
610610+ set_opts("re_peek", false)
611611+612612+ local cache_str, cache_url = get_cache_path(job, mode)
613613+ local scrolled_collumns = get_opts("scrolled_columns")
614614+615615+ local use_cache = cache_url
616616+ and fs.cha(cache_url)
617617+ and not is_on_list("preloading", cache_str)
618618+ and not is_on_list("bad_cache", cache_str)
619619+620620+ local target = use_cache and cache_url or file_url
621621+ local file_type = check_file_type(target)
622622+ local area = job.area or { h = 25 }
623623+ local limit = area.h - 7
624624+ local offset = job.skip
625625+626626+ return {
627627+ file_url = file_url,
628628+ mode = mode,
629629+ file_type = file_type,
630630+ cache_str = cache_str,
631631+ cache_url = cache_url,
632632+ scrolled_collumns = scrolled_collumns,
633633+ use_cache = use_cache,
634634+ target = target,
635635+ limit = limit,
636636+ offset = offset,
637637+ }
638638+end
639639+640640+local function remove_file(cache_url)
641641+ if fs.cha(cache_url) then
642642+ local ok, err = fs.remove("file", cache_url)
643643+ if not ok then
644644+ ya.err(
645645+ string.format("[duckdb] failed to remove partial cache at %s: %s", tostring(cache_url), tostring(err))
646646+ )
647647+ end
648648+ end
649649+end
650650+651651+local function finish_preload(success, cache_str1, cache_str2)
652652+ for _, cache_str in ipairs({ cache_str1, cache_str2 }) do
653653+ if not success then
654654+ add_to_list("bad_cache", cache_str)
655655+ end
656656+ remove_from_list("preloading", cache_str)
657657+ add_to_list("completed", cache_str)
658658+ end
659659+ return success
660660+end
661661+662662+local function create_cache(job, mode, file_type, limit)
663663+ local cache_str, cache_url = get_cache_path(job, mode)
664664+ if not cache_url or fs.cha(cache_url) or is_on_list("bad_cache", cache_str) then
665665+ return true
666666+ end
667667+668668+ add_to_list("preloading", cache_str)
669669+670670+ local target = tostring(cache_url)
671671+672672+ local base_query = generate_preload_query(job, mode, file_type, limit)
673673+ local query = string.format("COPY (%s) TO '%s' (FORMAT 'parquet');", base_query, target)
674674+ local output = run_query(job, query, nil, file_type)
675675+ ya.dbg("stdout: " .. tostring(output.stdout))
676676+ ya.dbg("stderr: " .. tostring(output.stderr))
677677+678678+ if not output or (output.stderr and output.stderr ~= "") then
679679+ ya.err(
680680+ output
681681+ and string.format(
682682+ "[duckdb] error creating %s cache for %s: %s",
683683+ mode,
684684+ tostring(job.file.url),
685685+ output.stderr
686686+ )
687687+ or string.format(
688688+ "[duckdb] no output returned while creating %s cache for %s",
689689+ mode,
690690+ tostring(job.file.url)
691691+ )
692692+ )
693693+ remove_file(cache_url)
694694+ local result = finish_preload(false, cache_str)
695695+ return result
696696+ end
697697+698698+ local result = finish_preload(true, cache_str)
699699+ return result
700700+end
701701+702702+local function is_plain_text(job, file_type)
703703+ local file_hash, _ = get_cache_path(job, "standard", "text")
704704+ if is_on_list("is_plain_text", file_hash) then
705705+ return true
706706+ end
707707+708708+ file_type = file_type or check_file_type(job.file.url)
709709+ if file_type ~= "text" then
710710+ return false
711711+ end
712712+713713+ local query = {
714714+ ".mode csv",
715715+ ".headers off",
716716+ string.format("select count(column_name) from (describe from read_csv('%s'));", tostring(job.file.url)),
717717+ }
718718+ local output = run_query(job, query, nil, file_type)
719719+ local result = (output and output.stdout == "1\r\n")
720720+721721+ if result then
722722+ add_to_list("is_plain_text", file_hash)
723723+ end
724724+725725+ return result
726726+end
727727+728728+-- Preload summarized and standard preview caches
729729+function M:preload(job)
730730+ if is_plain_text(job, nil) then
731731+ return true
732732+ end
733733+ local limit = get_opts("limit")
734734+ local file_type = check_file_type(job.file.url)
735735+ local all_done = true
736736+737737+ if file_type == "duckdb" then
738738+ return true
739739+ end
740740+741741+ for _, mode in ipairs({ "standard", "summarized" }) do
742742+ local success = create_cache(job, mode, file_type, limit)
743743+ if not success then
744744+ all_done = false
745745+ end
746746+ end
747747+748748+ return all_done
749749+end
750750+751751+-- Peek with mode toggle if scrolling at top
752752+function M:peek(job)
753753+ local args = prepare_peek_context(job)
754754+ if is_plain_text(job, args.file_type) then
755755+ return require("code"):peek(job)
756756+ end
757757+758758+ local query = generate_peek_query(args.target, job, args.limit, args.offset, args.file_type, args.cache_str)
759759+ ya.dbg("query: " .. tostring(query))
760760+ local output = run_query(job, query, args.target, args.file_type)
761761+ ya.dbg("stdout: " .. tostring(output.stdout))
762762+ ya.dbg("stderr: " .. tostring(output.stderr))
763763+ if not output_is_valid(output, args.mode, job) then
764764+ if args.target == args.cache_url and args.scrolled_collumns == 0 then
765765+ add_to_list("bad_cache", args.cache_str)
766766+ remove_file(args.cache_url)
767767+ return require("duckdb"):peek(job)
768768+ elseif is_on_list("bad_cache", args.cache_str) then
769769+ return require("code"):peek(job)
770770+ end
771771+ end
772772+773773+ if args.target == args.file_url and args.mode == "summarized" and not args.use_cache then
774774+ render_output(output, job)
775775+ while not is_on_list("completed", args.cache_str) do
776776+ ya.sleep(0.2)
777777+ end
778778+ clear_list("completed")
779779+ set_opts("re_peek", true)
780780+ return require("duckdb"):peek(job)
781781+ end
782782+783783+ render_output(output, job)
784784+end
785785+786786+-- Seek, also triggers mode change if skip negative.
787787+function M:seek(job)
788788+ local OFFSET_BASE = 50
789789+ local current_skip = math.max(0, cx.active.preview.skip - OFFSET_BASE)
790790+ local units = job.units or 0
791791+ local new_skip = current_skip + units
792792+793793+ if new_skip < 0 then
794794+ -- Toggle preview mode
795795+ local mode = get_opts("mode")
796796+ local new_mode = (mode == "summarized") and "standard" or "summarized"
797797+ set_opts("mode", new_mode)
798798+ set_opts("mode_changed", true)
799799+ -- Trigger re-peek
800800+ ya.emit("peek", { OFFSET_BASE, only_if = job.file.url })
801801+ else
802802+ ya.emit("peek", { new_skip + OFFSET_BASE, only_if = job.file.url })
803803+ end
804804+end
805805+806806+return M
···11-# duckdb.yazi
22-33-[duckdb](https://github.com/duckdb/duckdb) now in [yazi](https://github.com/sxyazi/yazi).
44-55-<img width="1710" alt="Screenshot 2025-03-22 at 18 00 06" src="https://github.com/user-attachments/assets/db09fff9-2db1-4273-9ddf-34d0bf087967" />
66-77-## Installation
88-99-To install, use the command:
1010-1111-ya pack -a wylie102/duckdb
1212-1313-and add to your yazi.toml:
1414-1515-[plugin]
1616-prepend_previewers = [
1717- { mime = "text/csv", run = "duckdb" },
1818- { name = "*.tsv", run = "duckdb" },
1919- { name = "*.json", run = "duckdb" },
2020- { name = "*.parquet", run = "duckdb" },
2121-]
2222-2323-prepend_preloaders = [
2424- { mime = "text/csv", run = "duckdb", multi = false },
2525- { name = "*.tsv", run = "duckdb", multi = false },
2626- { name = "*.json", run = "duckdb", multi = false },
2727- { name = "*.parquet", run = "duckdb", multi = false },
2828-]
2929-3030-### Yazi
3131-3232-[Installation installations](https://yazi-rs.github.io/docs/installation)
3333-3434-### duckdb
3535-3636-[Installation instructions](https://duckdb.org/docs/installation/?version=stable&environment=cli&platform=macos&download_method=direct)
3737-3838-## Recommended plugins
3939-4040-Use with a larger preview window or maximize the preview pane plugin:
4141-<https://github.com/yazi-rs/plugins/tree/main/toggle-pane.yazi>
4242-4343-## What does it do?
4444-4545-This plugin previews your data files in yazi using DuckDB, with two available view modes:
4646-4747-- Standard mode (default): Displays the file as a table.
4848-- Summarized mode: Uses DuckDB's summarize function, enhanced with custom formatting for readability.
4949-5050-Supported file types:
5151-5252-- .csv
5353-- .json
5454-- .parquet
5555-- .tsv
5656-5757-## New Features
5858-5959-- Default preview mode is now "standard."
6060-- Preview mode can be toggled within yazi:
6161- - Press "K" at the top of the file to toggle between "standard" and "summarized."
6262-- Preview mode is remembered per file, even after switching files or restarting yazi.
6363-- Performance improvements through caching:
6464- - "Standard" and "summarized" views are cached upon first load, improving scrolling performance.
6565-6666-## Setup and usage changes
6767-6868-Previously, preview mode was selected by setting an environment variable (`DUCKDB_PREVIEW_MODE`).
6969-7070-The new version no longer uses environment variables. Toggle preview modes directly within yazi using the keybinding described above.
7171-7272-Scrolling within both views (standard and summarized) is handled by pressing J (down) and K (up). Performance is significantly better due to caching.
7373-7474-## Preview
7575-7676-<img width="1710" alt="Screenshot 2025-03-22 at 17 59 21" src="https://github.com/user-attachments/assets/ac006667-4281-4e0a-87a4-bfaeefc6f20b" />
-231
yazi/plugins/duckdb.yazi/main.lua
···11--- This function generates the SQL query based on the preview mode.
22-local function generate_sql(job, mode)
33- if mode == "standard" then
44- return string.format("SELECT * FROM '%s' LIMIT 500", tostring(job.file.url))
55- else
66- return string.format(
77- [[SELECT
88- column_name AS column,
99- column_type AS type,
1010- count,
1111- approx_unique AS unique,
1212- null_percentage AS null,
1313- LEFT(min, 10) AS min,
1414- LEFT(max, 10) AS max,
1515- CASE
1616- WHEN column_type IN ('TIMESTAMP', 'DATE') THEN '-'
1717- WHEN avg IS NULL THEN 'NULL'
1818- WHEN TRY_CAST(avg AS DOUBLE) IS NULL THEN avg
1919- WHEN CAST(avg AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(avg AS DOUBLE), 2) AS VARCHAR)
2020- WHEN CAST(avg AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(avg AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
2121- WHEN CAST(avg AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(avg AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
2222- WHEN CAST(avg AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(avg AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
2323- ELSE '∞'
2424- END AS avg,
2525- CASE
2626- WHEN column_type IN ('TIMESTAMP', 'DATE') THEN '-'
2727- WHEN std IS NULL THEN 'NULL'
2828- WHEN TRY_CAST(std AS DOUBLE) IS NULL THEN std
2929- WHEN CAST(std AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(std AS DOUBLE), 2) AS VARCHAR)
3030- WHEN CAST(std AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(std AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
3131- WHEN CAST(std AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(std AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
3232- WHEN CAST(std AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(std AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
3333- ELSE '∞'
3434- END AS std,
3535- CASE
3636- WHEN column_type IN ('TIMESTAMP', 'DATE') THEN '-'
3737- WHEN q25 IS NULL THEN 'NULL'
3838- WHEN TRY_CAST(q25 AS DOUBLE) IS NULL THEN q25
3939- WHEN CAST(q25 AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(q25 AS DOUBLE), 2) AS VARCHAR)
4040- WHEN CAST(q25 AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(q25 AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
4141- WHEN CAST(q25 AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(q25 AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
4242- WHEN CAST(q25 AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(q25 AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
4343- ELSE '∞'
4444- END AS q25,
4545- CASE
4646- WHEN column_type IN ('TIMESTAMP', 'DATE') THEN '-'
4747- WHEN q50 IS NULL THEN 'NULL'
4848- WHEN TRY_CAST(q50 AS DOUBLE) IS NULL THEN q50
4949- WHEN CAST(q50 AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(q50 AS DOUBLE), 2) AS VARCHAR)
5050- WHEN CAST(q50 AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(q50 AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
5151- WHEN CAST(q50 AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(q50 AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
5252- WHEN CAST(q50 AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(q50 AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
5353- ELSE '∞'
5454- END AS q50,
5555- CASE
5656- WHEN column_type IN ('TIMESTAMP', 'DATE') THEN '-'
5757- WHEN q75 IS NULL THEN 'NULL'
5858- WHEN TRY_CAST(q75 AS DOUBLE) IS NULL THEN q75
5959- WHEN CAST(q75 AS DOUBLE) < 100000 THEN CAST(ROUND(CAST(q75 AS DOUBLE), 2) AS VARCHAR)
6060- WHEN CAST(q75 AS DOUBLE) < 1000000 THEN CAST(ROUND(CAST(q75 AS DOUBLE) / 1000, 1) AS VARCHAR) || 'k'
6161- WHEN CAST(q75 AS DOUBLE) < 1000000000 THEN CAST(ROUND(CAST(q75 AS DOUBLE) / 1000000, 2) AS VARCHAR) || 'm'
6262- WHEN CAST(q75 AS DOUBLE) < 1000000000000 THEN CAST(ROUND(CAST(q75 AS DOUBLE) / 1000000000, 2) AS VARCHAR) || 'b'
6363- ELSE '∞'
6464- END AS q75
6565- FROM (summarize FROM '%s')]],
6666- tostring(job.file.url)
6767- )
6868- end
6969-end
7070-7171-local function get_cache_path(job, type)
7272- local skip = job.skip
7373- job.skip = 0
7474- local base = ya.file_cache(job)
7575- job.skip = skip
7676- if not base then
7777- return nil
7878- end
7979- local suffix = ({ standard = "_standard.db", summarized = "_summarized.db", mode = "_mode.db" })[type or "standard"]
8080- return Url(tostring(base) .. suffix)
8181-end
8282-8383-local function run_query(job, query, target)
8484- local args = {}
8585- if target ~= job.file.url then
8686- table.insert(args, tostring(target))
8787- end
8888- table.insert(args, "-c")
8989- table.insert(args, query)
9090- local child = Command("duckdb"):args(args):stdout(Command.PIPED):stderr(Command.PIPED):spawn()
9191- if not child then
9292- return nil
9393- end
9494- local output, err = child:wait_with_output()
9595- if err then
9696- return nil
9797- end
9898- if not output.status.success then
9999- ya.err("DuckDB exited with error: " .. output.stderr)
100100- return nil
101101- end
102102- return output
103103-end
104104-105105-local function create_cache(job, mode, path)
106106- local filename = job.file.url:name() or "unknown"
107107- if fs.cha(path) then
108108- return true
109109- end
110110- local sql = (mode == "mode") and "CREATE TABLE My_table AS SELECT 'standard' AS Preview_mode;"
111111- or string.format("CREATE TABLE My_table AS (%s);", generate_sql(job, mode))
112112- local out = run_query(job, sql, path, mode == "mode" and "mode" or nil)
113113- if not out then
114114- ya.err("Preload - Failed to generate " .. mode .. " cache for file: " .. tostring(filename) .. ".")
115115- return false
116116- end
117117- return true
118118-end
119119-120120-local function get_preview_mode(job)
121121- local mode = "standard"
122122- local mode_cache = get_cache_path(job, "mode")
123123- if not mode_cache then
124124- return mode
125125- end
126126- if not fs.cha(mode_cache) then
127127- create_cache(job, "mode", mode_cache)
128128- end
129129- local result = run_query(job, "SELECT Preview_mode FROM My_table LIMIT 1;", mode_cache, "mode")
130130- if result and result.stdout and result.stdout ~= "" then
131131- local value = result.stdout:lower()
132132- if value:match("summarized") then
133133- mode = "summarized"
134134- end
135135- end
136136- return mode
137137-end
138138-139139-local function generate_query(target, job, limit, offset)
140140- local mode = get_preview_mode(job)
141141- if target == job.file.url then
142142- if mode == "standard" then
143143- return string.format("SELECT * FROM '%s' LIMIT %d OFFSET %d;", tostring(target), limit, offset)
144144- else
145145- local query = generate_sql(job, mode)
146146- return string.format("WITH query AS (%s) SELECT * FROM query LIMIT %d OFFSET %d;", query, limit, offset)
147147- end
148148- else
149149- return string.format("SELECT * FROM My_table LIMIT %d OFFSET %d;", limit, offset)
150150- end
151151-end
152152-153153-local function set_preview_mode(job, mode)
154154- local mode_cache = get_cache_path(job, "mode")
155155- if not mode_cache then
156156- return false
157157- end
158158- run_query(job, "DELETE FROM My_table;", mode_cache, "mode")
159159- local sql = string.format("INSERT INTO My_table VALUES ('%s');", mode)
160160- local result = run_query(job, sql, mode_cache, "mode")
161161- if not result then
162162- ya.err("SetPreviewMode - Failed to update preview mode.")
163163- return false
164164- end
165165- return true
166166-end
167167-168168-local M = {}
169169-170170-function M:preload(job)
171171- local cache_standard = get_cache_path(job, "standard")
172172- local cache_summarized = get_cache_path(job, "summarized")
173173- if not cache_standard or not cache_summarized then
174174- return false
175175- end
176176- if fs.cha(cache_standard) and fs.cha(cache_summarized) then
177177- return true
178178- end
179179- local success = true
180180- success = create_cache(job, "standard", cache_standard) and success
181181- success = create_cache(job, "summarized", cache_summarized) and success
182182- return success
183183-end
184184-185185-function M:peek(job)
186186- local raw_skip = job.skip or 0
187187- local skip = math.max(0, raw_skip - 50)
188188- if raw_skip > 0 and raw_skip < 50 then
189189- local current_mode = get_preview_mode(job)
190190- local new_mode = current_mode == "standard" and "summarized" or "standard"
191191- set_preview_mode(job, new_mode)
192192- skip = 0
193193- end
194194- job.skip = skip
195195- local mode = get_preview_mode(job)
196196- local cache = get_cache_path(job, mode)
197197- local file_url = job.file.url
198198- local target = cache
199199- local limit = job.area.h - 7
200200- local offset = skip
201201- if not cache or not fs.cha(cache) then
202202- target = file_url
203203- end
204204- local query = generate_query(target, job, limit, offset)
205205- local output = run_query(job, query, target)
206206- if not output or output.stdout == "" then
207207- if target ~= file_url then
208208- target = file_url
209209- query = generate_query(target, job, limit, offset)
210210- output = run_query(job, query, target)
211211- if not output or output.stdout == "" then
212212- return require("code"):peek(job)
213213- end
214214- else
215215- return require("code"):peek(job)
216216- end
217217- end
218218- ya.preview_widgets(job, { ui.Text.parse(output.stdout):area(job.area) })
219219-end
220220-221221-function M:seek(job)
222222- local OFFSET_BASE = 50
223223- local encoded_current_skip = cx.active.preview.skip or 0
224224- local current_skip = math.max(0, encoded_current_skip - OFFSET_BASE)
225225- local units = job.units or 0
226226- local new_skip = current_skip + units
227227- local encoded_skip = new_skip + OFFSET_BASE
228228- ya.manager_emit("peek", { encoded_skip, only_if = job.file.url })
229229-end
230230-231231-return M
+6-2
yazi/yazi.toml
yazi/.config/yazi/yazi.toml
···44 { name = "*.tsv", run = "duckdb" },
55 { name = "*.json", run = "duckdb" },
66 { name = "*.parquet", run = "duckdb" },
77+ { name = "*.txt", run = "duckdb" },
88+ { name = "*.xlsx", run = "duckdb" },
99+ { name = "*.db", run = "duckdb" },
1010+ { name = "*.duckdb", run = "duckdb" }
711]
812913prepend_preloaders = [
···1115 { name = "*.tsv", run = "duckdb", multi = false },
1216 { name = "*.json", run = "duckdb", multi = false },
1317 { name = "*.parquet", run = "duckdb", multi = false },
1414- { name = "*.db", run = "duckdb" },
1515- { name = "*.duckdb", run = "duckdb" },
1818+ { name = "*.txt", run = "duckdb", multi = false },
1919+ { name = "*.xlsx", run = "duckdb", multi = false }
1620]