<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://thomasd.be/feed.xml" rel="self" type="application/atom+xml" /><link href="https://thomasd.be/" rel="alternate" type="text/html" /><updated>2026-02-06T12:02:28+01:00</updated><id>https://thomasd.be/feed.xml</id><title type="html">Thomas Daniels’s blog</title><subtitle>My personal blog about programming and other topics to be determined.</subtitle><entry><title type="html">Aix: Efficiently storing and querying chess game collections</title><link href="https://thomasd.be/2026/02/01/aix-storing-querying-chess-games.html" rel="alternate" type="text/html" title="Aix: Efficiently storing and querying chess game collections" /><published>2026-02-01T14:02:00+01:00</published><updated>2026-02-01T14:02:00+01:00</updated><id>https://thomasd.be/2026/02/01/aix-storing-querying-chess-games</id><content type="html" xml:base="https://thomasd.be/2026/02/01/aix-storing-querying-chess-games.html"><![CDATA[<p>The <a href="https://database.lichess.org/">Lichess database</a> contains over 7 billion chess games played on <a href="https://lichess.org/">Lichess</a>. To make it easier to query these games, I have released the open-source <a href="https://github.com/thomas-daniels/aix">Aix extension for DuckDB</a> and the accompanying <a href="https://huggingface.co/datasets/thomasd1/aix-lichess-database">Aix-compatible Lichess database</a>.</p>

<ul id="markdown-toc">
  <li><a href="#introduction" id="markdown-toc-introduction">Introduction</a></li>
  <li><a href="#getting-started-with-aix" id="markdown-toc-getting-started-with-aix">Getting started with Aix</a></li>
  <li><a href="#space-efficient-storage" id="markdown-toc-space-efficient-storage">Space-efficient storage</a></li>
  <li><a href="#querying-chess-games" id="markdown-toc-querying-chess-games">Querying chess games</a></li>
  <li><a href="#details-of-medium-compression" id="markdown-toc-details-of-medium-compression">Details of Medium compression</a></li>
  <li><a href="#try-it-out" id="markdown-toc-try-it-out">Try it out!</a></li>
</ul>

<h2 id="introduction">Introduction</h2>

<p>Every month, the <a href="https://database.lichess.org/">Lichess database</a> grows about 100 million games in size. All together, the size of the compressed PGN files adds up to over 2 TB. Uncompressed, they would be over 15 TB. This is not trivial to query.</p>

<p>Several tools exist that can handle queries on large chess databases, such as <a href="https://github.com/mcostalba/scoutfish">Scoutfish</a>, <a href="https://www.cs.kent.ac.uk/people/staff/djb/pgn-extract/">pgn-extract</a> or <a href="https://www.gadycosteff.com/cql/">CQL</a>. These tools are very useful, but also have their limitations:</p>

<ul>
  <li>They require uncompressed PGN files, which can take up a lot of space.</li>
  <li>They only implement filtering, other query operations (e.g., aggregations) are not supported.</li>
  <li>They only execute queries on the game’s moves (and perhaps PGN tags), while clock times and engine evaluations are out of scope.</li>
</ul>

<p>Any processing beyond the above items would need custom code with the help of libraries such as <a href="https://github.com/niklasf/python-chess"><code class="language-plaintext highlighter-rouge">python-chess</code></a> or <a href="https://github.com/niklasf/shakmaty"><code class="language-plaintext highlighter-rouge">shakmaty</code></a>.</p>

<p>To make it easier to query such massive databases, I developed <a href="https://github.com/thomas-daniels/aix">Aix</a>. My goals were to support:</p>

<ol>
  <li>Space-efficient game storage</li>
  <li>Querying on game metadata (PGN tags)</li>
  <li>Querying on moves/positions</li>
  <li>Querying on clocks and engine evaluations</li>
  <li>Query operations beyond filtering</li>
  <li>Parallelized processing</li>
</ol>

<p>And now, with the <a href="https://github.com/thomas-daniels/aix">Aix extension for DuckDB</a> and the <a href="https://huggingface.co/datasets/thomasd1/aix-lichess-database">Aix-compatible Lichess database</a> (or the <a href="https://crates.io/crates/pgn-to-aix"><code class="language-plaintext highlighter-rouge">pgn-to-aix</code></a> command-line tool), this is made possible and you can execute SQL queries over chess games. For example, to generate a <a href="https://github.com/Ramon-Deniz/ChessData?tab=readme-ov-file#heatmap-of-chess-moves">heatmap of king move destinations</a>, this query can do the job:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">with</span> <span class="n">king_destinations</span> <span class="k">as</span> <span class="p">(</span>
    <span class="k">select</span>
        <span class="n">move_details</span><span class="p">(</span><span class="n">movedata</span><span class="p">)</span>
            <span class="p">.</span><span class="n">list_filter</span><span class="p">(</span><span class="n">lambda</span> <span class="n">m</span><span class="p">:</span> <span class="n">m</span><span class="p">.</span><span class="k">role</span> <span class="o">=</span> <span class="s1">'k'</span><span class="p">)</span>
            <span class="p">.</span><span class="n">apply</span><span class="p">(</span><span class="n">lambda</span> <span class="n">m</span><span class="p">:</span> <span class="n">m</span><span class="p">.</span><span class="k">to</span><span class="p">)</span>
        <span class="k">as</span> <span class="n">destinations</span>
    <span class="k">from</span> <span class="s1">'aix_lichess_2025-12_low.parquet'</span>
<span class="p">),</span>
<span class="n">unnested</span> <span class="k">as</span> <span class="p">(</span>
    <span class="k">select</span> <span class="k">unnest</span><span class="p">(</span><span class="n">destinations</span><span class="p">)</span> <span class="k">as</span> <span class="n">destination</span> <span class="k">from</span> <span class="n">king_destinations</span>
<span class="p">),</span>
<span class="n">aggregated</span> <span class="k">as</span> <span class="p">(</span>
    <span class="k">select</span> <span class="n">destination</span><span class="p">,</span> <span class="k">count</span><span class="p">()</span> <span class="k">from</span> <span class="n">unnested</span> <span class="k">group</span> <span class="k">by</span> <span class="mi">1</span> <span class="k">order</span> <span class="k">by</span> <span class="mi">2</span> <span class="k">desc</span>
<span class="p">)</span>

<span class="k">from</span> <span class="n">aggregated</span><span class="p">;</span>
</code></pre></div></div>

<p>Which results in:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────┬──────────────┐
│ destination │ count_star() │
│   varchar   │    int64     │
├─────────────┼──────────────┤
│ g1          │     74020594 │
│ g8          │     71579360 │
│ g7          │     23388424 │
...
</code></pre></div></div>

<p>The <a href="https://github.com/thomas-daniels/aix/blob/main/docs/functions.md#move_details"><code class="language-plaintext highlighter-rouge">move_details</code> function</a> from the Aix extension makes this possible – the other functions (such as <a href="https://duckdb.org/docs/stable/sql/functions/list#list_filterlist-lambdax"><code class="language-plaintext highlighter-rouge">list_filter</code></a>) are core functions from DuckDB and are very powerful in combination with the Aix functions.</p>

<p>The Aix-compatible Lichess database offers three compression levels for the encoded chess games: Low, Medium, and High. A lower compression level provides faster decoding, so the choice between these levels enables a trade-off between speed and space usage. On the December 2025 file with Low compression (15.5 GB), the heatmap query takes about 92 seconds (AMD Ryzen Threadripper 3960X, with DuckDB thread count limited to 24). On the Medium (13.5 GB) and High (11.8 GB) compression files, this takes 104 and 241 seconds respectively.</p>

<h2 id="getting-started-with-aix">Getting started with Aix</h2>

<p>Start by <a href="https://duckdb.org/install/">installing DuckDB 1.4.4</a>, then install the Aix extension. The extension is now available in the Community Extensions as <code class="language-plaintext highlighter-rouge">aixchess</code>, so you can install it with the following DuckDB command:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">INSTALL</span> <span class="n">aixchess</span> <span class="k">FROM</span> <span class="n">community</span><span class="p">;</span>
</code></pre></div></div>

<p>(If you installed the extension earlier from the files in the GitHub release, do <code class="language-plaintext highlighter-rouge">FORCE INSTALL aixchess FROM community;</code>.)</p>

<p>Then load the extension (also do this for every new DuckDB session):</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">LOAD</span> <span class="n">aixchess</span><span class="p">;</span>
</code></pre></div></div>

<p>Now all of the extension functions are available. Download one of the <a href="https://huggingface.co/datasets/thomasd1/aix-lichess-database">Aix-compatible Lichess database files on Hugging Face</a> and try out the above heatmap query! To experiment on a smaller file, download one of the older months or derive a smaller file from a large one using SQL’s <code class="language-plaintext highlighter-rouge">LIMIT</code> clause.</p>

<p>(At the time of writing, the dataset on Hugging Face is still very incomplete, as I’m still generating and uploading Aix files. If you want to try Aix on a month that’s not available yet, you could use <a href="https://github.com/thomas-daniels/aix/tree/main/pgn-to-aix"><code class="language-plaintext highlighter-rouge">pgn-to-aix</code></a> to generate it already.)</p>

<p>If you have your own PGN file that you want to use Aix on, you first need to convert it to an Aix-compatible Parquet file using <a href="https://crates.io/crates/pgn-to-aix"><code class="language-plaintext highlighter-rouge">pgn-to-aix</code></a>.</p>

<p>The <a href="https://github.com/thomas-daniels/aix/blob/main/docs/functions.md">functions documentation</a> provides further guidance, and the <a href="https://github.com/thomas-daniels/aix/tree/main/test/sql">unit tests</a> also provide some simple usage examples.</p>

<p>If you want to process the Aix database files in a manner too complex for an SQL query, you can decode Aix-encoded games using the Rust crate <a href="https://crates.io/crates/aix-chess-compression"><code class="language-plaintext highlighter-rouge">aix-chess-compression</code></a>. Read the rows from a database using a crate such as <a href="https://crates.io/crates/duckdb"><code class="language-plaintext highlighter-rouge">duckdb</code></a> or <a href="https://crates.io/crates/parquet"><code class="language-plaintext highlighter-rouge">parquet</code></a>, then use <code class="language-plaintext highlighter-rouge">aix-chess-compression</code> to decode the moves/positions.</p>

<p>The rest of this blog post provides more background and details on how Aix works.</p>

<h2 id="space-efficient-storage">Space-efficient storage</h2>

<p>PGN uses storage inefficently. An easy way to save space is by using the <a href="https://parquet.apache.org/">Parquet</a> format, a column-oriented data file format for large-scale tabular data. Each row in the file can represent a game. Lichess already offers their database as Parquet files <a href="https://huggingface.co/datasets/Lichess/standard-chess-games">on Hugging Face</a>, but the moves are still represented in the PGN format.</p>

<p>Each move in PGN format <a href="https://lichess.org/@/lichess/blog/developer-update-275-improved-game-compression/Wqa7GiAA">uses almost 6 bytes</a> on average. This is very inefficent considering that even a simple, naive encoding of the origin square, destination square, and possible promotion requires less than 2 bytes. Aix implements three compression levels for compact binary game encoding:</p>

<ul>
  <li>Low: 2 bytes per move. 6 bits for “from” square, 6 bits for “to” square, 1 bit to indicate capture, 1 bit to indicate promotion, 2 bits for promotion piece.</li>
  <li>Medium: variable number of bits per move (7.1 on average), see <a href="#details-of-medium-compression">Details of Medium compression</a>.</li>
  <li>High: variable number of bits per move (4.6 on average), based on <a href="https://lichess.org/@/lichess/blog/developer-update-275-improved-game-compression/Wqa7GiAA">Lichess’s game compression algorithm</a>, implemented using the <a href="https://crates.io/crates/chess-huffman">chess-huffman</a> Rust crate.</li>
</ul>

<p>The encoded game also needs 2 bits to indicate which compression level has been used.</p>

<p>Not only the moves need to be stored efficiently, but also the clock times and engine evaluations. Nothing too fancy is being used here, they are just represented as integer lists, which can directly be stored in Parquet. Only the evaluations have one tricky aspect: an engine evaluation can either be an advantage in centipawns or a forced mate in N moves. An evaluation is stored as a 16-bit integer, and the highest and lowest 512 values represent mate, while all other values represent an advantage in centipawns. In other words: -32,767 to -32,257 represents mate in 1 to mate in 512 for black, 32,256 to 32,767 represents mate in 512 to mate in 1 for white, everything else represents centipawns.</p>

<p>Metadata from PGN tags is stored in other columns of the most appropriate datatype.</p>

<h2 id="querying-chess-games">Querying chess games</h2>

<p>There are many query engines for Parquet, and while those can be useful for queries involving metadata (and in some cases even clock times or evaluations), they do not know the chess game encoding and do not support queries over moves or positions. This can be overcome: the analytical database system <a href="https://duckdb.org/">DuckDB</a> can process Parquet, and <a href="https://duckdb.org/docs/stable/extensions/overview">provides an extension mechanism</a> to add your own functionality (such as <a href="https://thomasd.be/2025/03/16/duckdb-extension-scalar-functions.html">scalar SQL functions</a>).</p>

<p>Adding custom functions through a DuckDB extension enables querying moves and positions of a game, or transformations of clock times and evaluations, while at the same time retaining all functionality that SQL offers – recall the heatmap query in the introduction of this post, where moves are filtered by piece and the destination squares are then aggregated. An overview of all available functions can be found in <a href="https://github.com/thomas-daniels/aix/tree/main/docs">the Aix documentation</a>.</p>

<p>With the <a href="https://github.com/thomas-daniels/aix/blob/main/docs/functions.md#scoutfish_query"><code class="language-plaintext highlighter-rouge">scoutfish_query[_plies]</code> functions</a>, Aix even has feature parity with Scoutfish (approximately). Scoutfish is still noticeably faster because of its custom index, though. In theory, a similar function could be implemented to achieve feature parity with CQL, but because of the higher complexity of CQL, this will remain theoretical for at least a while.</p>

<p>The extension is implemented in C++ while all chess-related logic is in Rust. I used <a href="https://github.com/rust-diplomat/diplomat/">Diplomat</a> to generate the FFI definitions, which was a smooth experience. Diplomat requires C++ 17 or above, and although DuckDB is written for C++ 11, it compiles fine with C++ 17 on all operating systems.</p>

<h2 id="details-of-medium-compression">Details of Medium compression</h2>

<p>The Medium compression level encodes each move in a variable number of bits. The first bits represent the piece that is being moved, the last bits represent how it moves. This idea is similar to <a href="https://triplehappy.wordpress.com/2015/10/26/chess-move-compression/">an encoding scheme proposed by Bill Forster</a>, but several aspects of the execution are different.</p>

<p>Each side has at most 16 pieces on the board, so the moved piece can be encoded in 4 bits or fewer. If the side to move has N pieces, then <code class="language-plaintext highlighter-rouge">ceil(log2(N))</code> bits are used to encode the moving piece. If only the king is left (N = 1), no bits need to be wasted on the moving piece.</p>

<p>The number of bits for the move depends on the piece and is determined by the maximum number of legal moves that the piece could have in any position. King and knight moves take 3 bits, rook and bishop moves 4 bits, and queen moves 5 bits. These numbers are fixed: while they may be more than needed (e.g. if a queen only has 4 moves, 2 bits would be enough in theory), reducing the number of bits requires the computationally expensive operation of calculating the legal moves.</p>

<p>The lowest number of bits needed is 3 (lone king), the highest is 9 (queen move when the moving side has 9 to 16 pieces left).</p>

<h2 id="try-it-out">Try it out!</h2>

<p>Give it a shot: install the <a href="https://github.com/thomas-daniels/aix">Aix extension</a> and try running queries on the <a href="https://huggingface.co/datasets/thomasd1/aix-lichess-database">database files</a>. If you run into any problems, feel free to open an issue on the <a href="https://github.com/thomas-daniels/aix">GitHub repository</a> or contact me <a href="https://bsky.app/profile/thomasd.be">on Bluesky</a>. Did something cool using Aix? Feel free to let me know as well!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[The Lichess database contains over 7 billion chess games played on Lichess. To make it easier to query these games, I have released the open-source Aix extension for DuckDB and the accompanying Aix-compatible Lichess database.]]></summary></entry><entry><title type="html">Writing scalar functions in DuckDB extensions</title><link href="https://thomasd.be/2025/03/16/duckdb-extension-scalar-functions.html" rel="alternate" type="text/html" title="Writing scalar functions in DuckDB extensions" /><published>2025-03-16T13:57:00+01:00</published><updated>2025-03-16T13:57:00+01:00</updated><id>https://thomasd.be/2025/03/16/duckdb-extension-scalar-functions</id><content type="html" xml:base="https://thomasd.be/2025/03/16/duckdb-extension-scalar-functions.html"><![CDATA[<p>The analytical database system <a href="https://duckdb.org/">DuckDB</a> offers an extension mechanism allowing you to add your own functionality to the system. The C++ <a href="https://github.com/duckdb/extension-template">extension template</a> provides a starting point for extensions and contains two example scalar functions. This blog post goes into more detail on how to write such functions.</p>

<p>What I write in this blog post is what I’ve learned while experimenting with DuckDB extensions (I am not affiliated with DuckDB), so mistakes are possible :) This post is also just an introduction: there is a lot more that won’t be discussed here!</p>

<p>The examples in this post are <a href="https://github.com/thomas-daniels/duckdb-extension-scalar-function-examples">bundled in a GitHub repository</a> that is based on the extension template, so you can run it and try all functions out yourself. The scope is limited to scalar functions, i.e. functions that take one or more values as input and return another value (such as <a href="https://duckdb.org/docs/stable/sql/functions/char.html#replacestring-source-target"><code class="language-plaintext highlighter-rouge">replace</code></a>, <a href="https://duckdb.org/docs/stable/sql/functions/numeric#expx"><code class="language-plaintext highlighter-rouge">exp</code></a>, and many more). Out of scope topics include aggregate functions and table functions.</p>

<ul id="markdown-toc">
  <li><a href="#the-basics" id="markdown-toc-the-basics">The basics</a></li>
  <li><a href="#types" id="markdown-toc-types">Types</a>    <ul>
      <li><a href="#blob" id="markdown-toc-blob">BLOB</a></li>
      <li><a href="#bitstring" id="markdown-toc-bitstring">BITSTRING</a></li>
      <li><a href="#date-types" id="markdown-toc-date-types">Date types</a></li>
    </ul>
  </li>
  <li><a href="#returning-null" id="markdown-toc-returning-null">Returning NULL</a></li>
  <li><a href="#nested-types-and-genericexecutor" id="markdown-toc-nested-types-and-genericexecutor">Nested types and GenericExecutor</a>    <ul>
      <li><a href="#returning-a-struct-or-list" id="markdown-toc-returning-a-struct-or-list">Returning a STRUCT or LIST</a></li>
      <li><a href="#custom-result-type-with-assignresult" id="markdown-toc-custom-result-type-with-assignresult">Custom result type with AssignResult</a></li>
      <li><a href="#taking-a-struct-as-input-argument" id="markdown-toc-taking-a-struct-as-input-argument">Taking a STRUCT as input argument</a></li>
    </ul>
  </li>
  <li><a href="#where-to-find-more-examples" id="markdown-toc-where-to-find-more-examples">Where to find more examples</a></li>
</ul>

<h2 id="the-basics">The basics</h2>

<p>We’ll start with the example that’s provided in the DuckDB extension template, the <code class="language-plaintext highlighter-rouge">quack</code> scalar function:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">inline</span> <span class="kt">void</span> <span class="nf">QuackScalarFun</span><span class="p">(</span><span class="n">DataChunk</span> <span class="o">&amp;</span><span class="n">args</span><span class="p">,</span> <span class="n">ExpressionState</span> <span class="o">&amp;</span><span class="n">state</span><span class="p">,</span>
                           <span class="n">Vector</span> <span class="o">&amp;</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">name_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
  <span class="n">UnaryExecutor</span><span class="o">::</span><span class="n">Execute</span><span class="o">&lt;</span><span class="n">string_t</span><span class="p">,</span> <span class="n">string_t</span><span class="o">&gt;</span><span class="p">(</span>
      <span class="n">name_vector</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">args</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span> <span class="p">[</span><span class="o">&amp;</span><span class="p">](</span><span class="n">string_t</span> <span class="n">name</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">return</span> <span class="n">StringVector</span><span class="o">::</span><span class="n">AddString</span><span class="p">(</span><span class="n">result</span><span class="p">,</span>
                                       <span class="s">"Quack "</span> <span class="o">+</span> <span class="n">name</span><span class="p">.</span><span class="n">GetString</span><span class="p">()</span> <span class="o">+</span> <span class="s">" 🐥"</span><span class="p">);</span>
      <span class="p">});</span>
<span class="p">}</span>

<span class="c1">// ...</span>

<span class="k">static</span> <span class="kt">void</span> <span class="nf">LoadInternal</span><span class="p">(</span><span class="n">DatabaseInstance</span> <span class="o">&amp;</span><span class="n">instance</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">auto</span> <span class="n">quack_scalar_function</span> <span class="o">=</span> <span class="n">ScalarFunction</span><span class="p">(</span>
      <span class="s">"quack"</span><span class="p">,</span> <span class="p">{</span><span class="n">LogicalType</span><span class="o">::</span><span class="n">VARCHAR</span><span class="p">},</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">VARCHAR</span><span class="p">,</span> <span class="n">QuackScalarFun</span><span class="p">);</span>
  <span class="n">ExtensionUtil</span><span class="o">::</span><span class="n">RegisterFunction</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">quack_scalar_function</span><span class="p">);</span>

  <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">quack(VARCHAR) -&gt; VARCHAR</code> function is implemented by <code class="language-plaintext highlighter-rouge">QuackScalarFun</code>. This function takes three arguments:</p>

<ul>
  <li>A <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/types/data_chunk.hpp"><code class="language-plaintext highlighter-rouge">DataChunk</code></a>, representing a set of input <code class="language-plaintext highlighter-rouge">Vector</code>s, one for each argument of the scalar function.</li>
  <li>An <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/execution/expression_executor_state.hpp"><code class="language-plaintext highlighter-rouge">ExpressionState</code></a>, containing information about the query’s expression state. (This argument is beyond the scope of this post.)</li>
  <li>A <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/types/vector.hpp#L78"><code class="language-plaintext highlighter-rouge">Vector</code></a> to store the result values. This is <a href="https://duckdb.org/docs/stable/internals/vector.html">a logical representation of an array with data of a single type</a>. There are different formats of vectors.</li>
</ul>

<p><code class="language-plaintext highlighter-rouge">UnaryExecutor::Execute&lt;TA, TR&gt;</code> enables an efficient evaluation of a function over the contents of a vector, regardless of its format. The passed lambda expression takes an argument of type <code class="language-plaintext highlighter-rouge">TA</code> and returns a value of type <code class="language-plaintext highlighter-rouge">TR</code>. Both are <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/types/string_type.hpp"><code class="language-plaintext highlighter-rouge">duckdb::string_t</code></a> in this situation because <code class="language-plaintext highlighter-rouge">quack</code> takes and returns a <code class="language-plaintext highlighter-rouge">VARCHAR</code>. The <code class="language-plaintext highlighter-rouge">string_t</code> type has a <code class="language-plaintext highlighter-rouge">GetString()</code> method to obtain an <code class="language-plaintext highlighter-rouge">std::string</code>. (<code class="language-plaintext highlighter-rouge">string_t</code> is also used for <code class="language-plaintext highlighter-rouge">BLOB</code>s, see later this post.)</p>

<p>The call to <code class="language-plaintext highlighter-rouge">StringVector::AddString</code> is specific to functions returning a string. If you return a primitive type, such as <code class="language-plaintext highlighter-rouge">double</code>, just return the value directly.</p>

<p>In <code class="language-plaintext highlighter-rouge">LoadInternal</code>, we register our <code class="language-plaintext highlighter-rouge">quack</code> function. We create an instance of <code class="language-plaintext highlighter-rouge">ScalarFunction</code> and pass the necessary details: function name, types of input arguments, result type, and implementation. Then we do the actual registration with <code class="language-plaintext highlighter-rouge">ExtensionUtil::RegisterFunction</code>.</p>

<p>There are <a href="https://github.com/duckdb/duckdb/tree/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/vector_operations">other executors similar to <code class="language-plaintext highlighter-rouge">UnaryExecutor</code></a> for functions with different arities. Here’s an example of a ternary function, <code class="language-plaintext highlighter-rouge">discriminant(a, b, c)</code>, using the <code class="language-plaintext highlighter-rouge">TernaryExecutor</code> to return the discriminant of the quadratic equation <code class="language-plaintext highlighter-rouge">ax^2 + bx + c = 0</code>:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">inline</span> <span class="kt">void</span> <span class="nf">DiscriminantScalarFun</span><span class="p">(</span><span class="n">DataChunk</span> <span class="o">&amp;</span><span class="n">args</span><span class="p">,</span> <span class="n">ExpressionState</span> <span class="o">&amp;</span><span class="n">state</span><span class="p">,</span>
                                  <span class="n">Vector</span> <span class="o">&amp;</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">a_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">b_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">c_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>

  <span class="n">TernaryExecutor</span><span class="o">::</span><span class="n">Execute</span><span class="o">&lt;</span><span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="o">&gt;</span><span class="p">(</span>
      <span class="n">a_vector</span><span class="p">,</span> <span class="n">b_vector</span><span class="p">,</span> <span class="n">c_vector</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">args</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span>
      <span class="p">[</span><span class="o">&amp;</span><span class="p">](</span><span class="kt">double</span> <span class="n">a</span><span class="p">,</span> <span class="kt">double</span> <span class="n">b</span><span class="p">,</span> <span class="kt">double</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">auto</span> <span class="n">discriminant</span> <span class="o">=</span> <span class="n">b</span> <span class="o">*</span> <span class="n">b</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">a</span> <span class="o">*</span> <span class="n">c</span><span class="p">;</span>
        <span class="k">return</span> <span class="n">discriminant</span><span class="p">;</span>
      <span class="p">});</span>
<span class="p">}</span>

<span class="c1">// ...</span>

<span class="k">static</span> <span class="kt">void</span> <span class="nf">LoadInternal</span><span class="p">(</span><span class="n">DatabaseInstance</span> <span class="o">&amp;</span><span class="n">instance</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// ...</span>

  <span class="k">auto</span> <span class="n">discriminant_scalar_function</span> <span class="o">=</span> <span class="n">ScalarFunction</span><span class="p">(</span>
      <span class="s">"discriminant"</span><span class="p">,</span>
      <span class="p">{</span><span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">},</span>
      <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">,</span> <span class="n">DiscriminantScalarFun</span><span class="p">);</span>
  <span class="n">ExtensionUtil</span><span class="o">::</span><span class="n">RegisterFunction</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">discriminant_scalar_function</span><span class="p">);</span>

  <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Build the extension and give it a try:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D select discriminant(2, -15, 8);
┌─────────────────────────┐
│ discriminant(2, -15, 8) │
│         double          │
├─────────────────────────┤
│          161.0          │
└─────────────────────────┘
</code></pre></div></div>

<h2 id="types">Types</h2>

<p>Some of the SQL types have a straightforward equivalent in C++ (e.g., <code class="language-plaintext highlighter-rouge">DOUBLE</code> -&gt; <code class="language-plaintext highlighter-rouge">double</code> as used in the previous example, <code class="language-plaintext highlighter-rouge">INTEGER</code> -&gt; <code class="language-plaintext highlighter-rouge">int32_t</code>). Others may need some extra explanation and are discussed here.</p>

<h3 id="blob">BLOB</h3>

<p>Just like <code class="language-plaintext highlighter-rouge">VARCHAR</code>s, <code class="language-plaintext highlighter-rouge">BLOB</code>s are represented with <code class="language-plaintext highlighter-rouge">duckdb::string_t</code>. As we have seen in the <code class="language-plaintext highlighter-rouge">quack</code> example, this type provides a <code class="language-plaintext highlighter-rouge">GetString</code> method, but when dealing with <code class="language-plaintext highlighter-rouge">BLOB</code>s, we likely want to work with the raw bytes instead. <code class="language-plaintext highlighter-rouge">string_t</code> also provides the <code class="language-plaintext highlighter-rouge">const char* GetData()</code> and <code class="language-plaintext highlighter-rouge">idx_t GetSize()</code> methods. <code class="language-plaintext highlighter-rouge">GetData</code> returns a pointer to a singed char; to obtain a byte pointer (<code class="language-plaintext highlighter-rouge">const uint8_t *</code>) instead, use <code class="language-plaintext highlighter-rouge">const_data_ptr_cast(yourBlob.GetData())</code>.</p>

<h3 id="bitstring">BITSTRING</h3>

<p>A <a href="https://duckdb.org/docs/stable/sql/data_types/bitstring.html"><code class="language-plaintext highlighter-rouge">BITSTRING</code></a> is a variable-length string of 1s and 0s. This type is also represented with <code class="language-plaintext highlighter-rouge">string_t</code>, encoded in the manner <a href="https://github.com/duckdb/duckdb/blob/main/src/common/types/bit.cpp#L185">described in src/common/types/bit.cpp</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>First byte in bitstring contains amount of padded bits,
second byte in bitstring is the padded byte,
therefore the rest of the data starts at data + 2 (third byte)
</code></pre></div></div>

<h3 id="date-types">Date types</h3>

<p>A <code class="language-plaintext highlighter-rouge">DATE</code> is represented using the <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/types/date.hpp"><code class="language-plaintext highlighter-rouge">date_t</code></a> struct, of which the <code class="language-plaintext highlighter-rouge">days</code> field contains the number of days since 1970-01-01.</p>

<p>An <code class="language-plaintext highlighter-rouge">INTERVAL</code> is represented using the <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/types/interval.hpp#L24"><code class="language-plaintext highlighter-rouge">interval_t</code></a> struct, which has three fields you can make use of: <code class="language-plaintext highlighter-rouge">months</code>, <code class="language-plaintext highlighter-rouge">days</code>, and <code class="language-plaintext highlighter-rouge">micros</code>.</p>

<p>A <code class="language-plaintext highlighter-rouge">TIMESTAMP</code> is represented using the <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/types/timestamp.hpp"><code class="language-plaintext highlighter-rouge">timestamp_t</code></a> structs, of which the <code class="language-plaintext highlighter-rouge">value</code> field contains the number of microseconds since 1970-01-01. The same file contains structs for the other timestamp types.</p>

<p>All of these structs can be used as input or result type with <code class="language-plaintext highlighter-rouge">Execute</code>.</p>

<h2 id="returning-null">Returning NULL</h2>

<p>The <code class="language-plaintext highlighter-rouge">Execute</code> method from the previous examples will automatically set the result to NULL if any of the input arguments is NULL. However, if we want to choose to return NULL ourselves, we need to use the <code class="language-plaintext highlighter-rouge">ExecuteWithNulls</code> method instead. Its usage is similar to <code class="language-plaintext highlighter-rouge">Execute</code>, but the lambda takes two extra arguments: a <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/types/validity_mask.hpp"><code class="language-plaintext highlighter-rouge">ValidityMask</code></a> and an index, that can be used to mark a vector element as “invalid” (NULL).</p>

<p>Let’s demonstrate this by adding a function <code class="language-plaintext highlighter-rouge">fibonacci(INTEGER) -&gt; BIGINT</code> that returns the n’th Fibonacci number, except if <code class="language-plaintext highlighter-rouge">n &lt; 0</code> (undefined) or <code class="language-plaintext highlighter-rouge">n &gt; 92</code> (result does not fit in a 64-bit integer):</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">inline</span> <span class="kt">void</span> <span class="nf">FibonacciScalarFun</span><span class="p">(</span><span class="n">DataChunk</span> <span class="o">&amp;</span><span class="n">args</span><span class="p">,</span> <span class="n">ExpressionState</span> <span class="o">&amp;</span><span class="n">state</span><span class="p">,</span>
                               <span class="n">Vector</span> <span class="o">&amp;</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">input_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>

  <span class="k">auto</span> <span class="n">Phi</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">sqrt</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span> <span class="o">/</span> <span class="mi">2</span><span class="p">;</span>
  <span class="k">auto</span> <span class="n">phi</span> <span class="o">=</span> <span class="n">Phi</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>

  <span class="n">UnaryExecutor</span><span class="o">::</span><span class="n">ExecuteWithNulls</span><span class="o">&lt;</span><span class="kt">int32_t</span><span class="p">,</span> <span class="kt">int64_t</span><span class="o">&gt;</span><span class="p">(</span>
      <span class="n">input_vector</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">args</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span>
      <span class="p">[</span><span class="o">&amp;</span><span class="p">](</span><span class="kt">int32_t</span> <span class="n">input</span><span class="p">,</span> <span class="n">ValidityMask</span> <span class="o">&amp;</span><span class="n">mask</span><span class="p">,</span> <span class="n">idx_t</span> <span class="n">idx</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">input</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="o">&amp;&amp;</span> <span class="n">input</span> <span class="o">&lt;</span> <span class="mi">93</span><span class="p">)</span> <span class="p">{</span>
          <span class="k">return</span> <span class="n">lround</span><span class="p">((</span><span class="n">pow</span><span class="p">(</span><span class="n">Phi</span><span class="p">,</span> <span class="n">input</span><span class="p">)</span> <span class="o">-</span> <span class="n">pow</span><span class="p">(</span><span class="o">-</span><span class="n">phi</span><span class="p">,</span> <span class="n">input</span><span class="p">))</span> <span class="o">/</span> <span class="n">sqrt</span><span class="p">(</span><span class="mi">5</span><span class="p">));</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
          <span class="n">mask</span><span class="p">.</span><span class="n">SetInvalid</span><span class="p">(</span><span class="n">idx</span><span class="p">);</span>
          <span class="k">return</span> <span class="mi">0l</span><span class="p">;</span>
        <span class="p">}</span>
      <span class="p">});</span>
<span class="p">}</span>

<span class="c1">// ...</span>

<span class="k">static</span> <span class="kt">void</span> <span class="nf">LoadInternal</span><span class="p">(</span><span class="n">DatabaseInstance</span> <span class="o">&amp;</span><span class="n">instance</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// ...</span>

  <span class="k">auto</span> <span class="n">fibonacci_scalar_function</span> <span class="o">=</span>
      <span class="n">ScalarFunction</span><span class="p">(</span><span class="s">"fibonacci"</span><span class="p">,</span> <span class="p">{</span><span class="n">LogicalType</span><span class="o">::</span><span class="n">INTEGER</span><span class="p">},</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">BIGINT</span><span class="p">,</span>
                     <span class="n">FibonacciScalarFun</span><span class="p">);</span>
  <span class="n">ExtensionUtil</span><span class="o">::</span><span class="n">RegisterFunction</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span> <span class="n">fibonacci_scalar_function</span><span class="p">);</span>

  <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D select fibonacci(5), fibonacci(-3), fibonacci(100);
┌──────────────┬───────────────┬────────────────┐
│ fibonacci(5) │ fibonacci(-3) │ fibonacci(100) │
│    int64     │     int64     │     int64      │
├──────────────┼───────────────┼────────────────┤
│      5       │     NULL      │      NULL      │
└──────────────┴───────────────┴────────────────┘
</code></pre></div></div>

<p>Setting the result to NULL only takes a call of <code class="language-plaintext highlighter-rouge">mask.SetInvalid(idx)</code>.</p>

<p>(Note that if you want to handle NULL as <em>input</em> differently than “NULL in, NULL out”, neither <code class="language-plaintext highlighter-rouge">Execute</code> nor <code class="language-plaintext highlighter-rouge">ExecuteWithNulls</code> can help and you’ll need to operate on the vectors directly.)</p>

<h2 id="nested-types-and-genericexecutor">Nested types and GenericExecutor</h2>

<p>The discussed <code class="language-plaintext highlighter-rouge">...Executor</code> types so far do not provide a means to return <em>nested types</em>, such as <code class="language-plaintext highlighter-rouge">STRUCT</code> or <code class="language-plaintext highlighter-rouge">LIST</code>. That is where the <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/vector_operations/generic_executor.hpp#L246"><code class="language-plaintext highlighter-rouge">GenericExecutor</code></a> comes into play.</p>

<h3 id="returning-a-struct-or-list">Returning a STRUCT or LIST</h3>

<p>We’re going to write a function <code class="language-plaintext highlighter-rouge">solve_quadratic_equation(DOUBLE, DOUBLE, DOUBLE) -&gt; STRUCT(x1 DOUBLE, x2 DOUBLE)</code> that returns the solutions to the quadratic equation <code class="language-plaintext highlighter-rouge">ax^2 + bx + c = 0</code>. The structure looks similar to what we’ve seen before, but pay attention to the template parameters specifying the input and result types:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include</span> <span class="cpf">"duckdb/common/vector_operations/generic_executor.hpp"</span><span class="cp">
</span>
<span class="c1">// ...</span>

<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">SolveQuadraticEquationScalarFunc</span><span class="p">(</span><span class="n">DataChunk</span> <span class="o">&amp;</span><span class="n">args</span><span class="p">,</span>
                                             <span class="n">ExpressionState</span> <span class="o">&amp;</span><span class="n">state</span><span class="p">,</span>
                                             <span class="n">Vector</span> <span class="o">&amp;</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">a_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">b_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">c_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>

  <span class="n">GenericExecutor</span><span class="o">::</span><span class="n">ExecuteTernary</span><span class="o">&lt;</span>
      <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">,</span>
      <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">StructTypeBinary</span><span class="o">&lt;</span><span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="o">&gt;&gt;</span><span class="p">(</span>
      <span class="n">a_vector</span><span class="p">,</span> <span class="n">b_vector</span><span class="p">,</span> <span class="n">c_vector</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">args</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span>
      <span class="p">[</span><span class="o">&amp;</span><span class="p">](</span><span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">a</span><span class="p">,</span> <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">b</span><span class="p">,</span>
          <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">auto</span> <span class="n">discriminant</span> <span class="o">=</span> <span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">*</span> <span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">a</span><span class="p">.</span><span class="n">val</span> <span class="o">*</span> <span class="n">c</span><span class="p">.</span><span class="n">val</span><span class="p">;</span>
        <span class="n">StructTypeBinary</span><span class="o">&lt;</span><span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="o">&gt;</span> <span class="n">solution</span><span class="p">;</span>
        <span class="n">solution</span><span class="p">.</span><span class="n">a_val</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">+</span> <span class="n">sqrt</span><span class="p">(</span><span class="n">discriminant</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">a</span><span class="p">.</span><span class="n">val</span><span class="p">);</span>
        <span class="n">solution</span><span class="p">.</span><span class="n">b_val</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">-</span> <span class="n">sqrt</span><span class="p">(</span><span class="n">discriminant</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">a</span><span class="p">.</span><span class="n">val</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">solution</span><span class="p">;</span>
      <span class="p">});</span>
<span class="p">}</span>

<span class="k">static</span> <span class="kt">void</span> <span class="nf">LoadInternal</span><span class="p">(</span><span class="n">DatabaseInstance</span> <span class="o">&amp;</span><span class="n">instance</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// ...</span>

  <span class="n">child_list_t</span><span class="o">&lt;</span><span class="n">LogicalType</span><span class="o">&gt;</span> <span class="n">quadratic_equation_solution_child_types</span><span class="p">;</span>
  <span class="n">quadratic_equation_solution_child_types</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span>
      <span class="n">std</span><span class="o">::</span><span class="n">make_pair</span><span class="p">(</span><span class="s">"x1"</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">));</span>
  <span class="n">quadratic_equation_solution_child_types</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span>
      <span class="n">std</span><span class="o">::</span><span class="n">make_pair</span><span class="p">(</span><span class="s">"x2"</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">));</span>
  <span class="k">auto</span> <span class="n">solve_quadratic_equation_scalar_function</span> <span class="o">=</span> <span class="n">ScalarFunction</span><span class="p">(</span>
      <span class="s">"solve_quadratic_equation"</span><span class="p">,</span>
      <span class="p">{</span><span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">},</span>
      <span class="n">LogicalType</span><span class="o">::</span><span class="n">STRUCT</span><span class="p">(</span><span class="n">quadratic_equation_solution_child_types</span><span class="p">),</span>
      <span class="n">SolveQuadraticEquationScalarFunc</span><span class="p">);</span>
  <span class="n">ExtensionUtil</span><span class="o">::</span><span class="n">RegisterFunction</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span>
                                  <span class="n">solve_quadratic_equation_scalar_function</span><span class="p">);</span>

  <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Trying it out:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D select solve_quadratic_equation(1, -7, 12);
┌─────────────────────────────────────┐
│ solve_quadratic_equation(1, -7, 12) │
│    struct(x1 double, x2 double)     │
├─────────────────────────────────────┤
│ {'x1': 4.0, 'x2': 3.0}              │
└─────────────────────────────────────┘
</code></pre></div></div>

<p>Similar to <code class="language-plaintext highlighter-rouge">TernaryExecutor::Execute</code>, <code class="language-plaintext highlighter-rouge">GenericExecutor::ExecuteTernary</code> takes four template parameters, three for the input types and one for the output type. They look a bit different though: <code class="language-plaintext highlighter-rouge">PrimitiveType&lt;INPUT_TYPE&gt;</code> represents a primitive type, and <code class="language-plaintext highlighter-rouge">StructTypeBinary&lt;A_TYPE, B_TYPE&gt;</code> represents a <code class="language-plaintext highlighter-rouge">STRUCT</code> type with two fields. <code class="language-plaintext highlighter-rouge">PrimitiveType</code> has a <code class="language-plaintext highlighter-rouge">val</code> field containing the underlying value, and <code class="language-plaintext highlighter-rouge">StructTypeBinary</code> has an <code class="language-plaintext highlighter-rouge">a_val</code> and <code class="language-plaintext highlighter-rouge">b_val</code> field for the two values in the struct (which, to be clear, have nothing to do with our <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code> input values). Similar types also exist for <code class="language-plaintext highlighter-rouge">STRUCT</code>s <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/vector_operations/generic_executor.hpp#L68">with different arities</a>.</p>

<p>The registration of the function in <code class="language-plaintext highlighter-rouge">LoadInternal</code> is mostly the same as for the other functions, only the description of the result type is more extensive. The names and types of the <code class="language-plaintext highlighter-rouge">STRUCT</code>’s fields have to be constructed as a <code class="language-plaintext highlighter-rouge">child_list_t&lt;LogicalType&gt;</code> and passed to <code class="language-plaintext highlighter-rouge">LogicalType::STRUCT</code>.</p>

<p>Returning a <code class="language-plaintext highlighter-rouge">LIST</code> is very similar: instead of <code class="language-plaintext highlighter-rouge">StructTypeBinary</code>, use <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/vector_operations/generic_executor.hpp#L217"><code class="language-plaintext highlighter-rouge">GenericListType&lt;CHILD_TYPE&gt;</code></a>, which has a <code class="language-plaintext highlighter-rouge">values</code> field that is an <code class="language-plaintext highlighter-rouge">std::vector</code> of instances of the child type. As <code class="language-plaintext highlighter-rouge">LogicalType</code> in the <code class="language-plaintext highlighter-rouge">ScalarFunction</code> constructor, use <code class="language-plaintext highlighter-rouge">LogicalType::LIST(&lt;child LogicalType&gt;)</code>.</p>

<h3 id="custom-result-type-with-assignresult">Custom result type with AssignResult</h3>

<p>THe implementation of <code class="language-plaintext highlighter-rouge">solve_quadratic_equation</code> is definitely not perfect. If we apply it on an equation with no solutions, e.g. <code class="language-plaintext highlighter-rouge">solve_quadratic_equation(1, 2, 3)</code>, the result is <code class="language-plaintext highlighter-rouge">{'x1': -nan, 'x2': -nan}</code>. And if we choose <code class="language-plaintext highlighter-rouge">0</code> for the first argument, the result is <code class="language-plaintext highlighter-rouge">{'x1': -nan, 'x2': -inf}</code>, while the equation actually became a linear equation with <code class="language-plaintext highlighter-rouge">x = -c / b</code> as solution (assuming <code class="language-plaintext highlighter-rouge">b</code> isn’t 0 as well).</p>

<p>Let’s say we want to return NULL in those situations, rather than the struct. <code class="language-plaintext highlighter-rouge">GenericExecutor</code> does not have an equivalent of the <code class="language-plaintext highlighter-rouge">ExecuteWithNulls</code> method, but we can take another approach. Instead of choosing <code class="language-plaintext highlighter-rouge">StructTypeBinary</code> as result type, we can create our own type. This type must have a static method <code class="language-plaintext highlighter-rouge">AssignResult</code> that assigns an instance to the result vector. We can use <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/vector_operations/generic_executor.hpp#L115"><code class="language-plaintext highlighter-rouge">StructTypeBinary</code>’s implementation of this method</a> as inspiration for our custom type:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">QuadraticEquationSolution</span> <span class="p">{</span>
  <span class="kt">double</span> <span class="n">x1</span><span class="p">;</span>
  <span class="kt">double</span> <span class="n">x2</span><span class="p">;</span>
  <span class="kt">bool</span> <span class="n">exists</span><span class="p">;</span>

  <span class="k">static</span> <span class="kt">void</span> <span class="n">AssignResult</span><span class="p">(</span><span class="n">Vector</span> <span class="o">&amp;</span><span class="n">result</span><span class="p">,</span> <span class="n">idx_t</span> <span class="n">i</span><span class="p">,</span>
                           <span class="n">QuadraticEquationSolution</span> <span class="n">solution</span><span class="p">)</span> <span class="p">{</span>
    <span class="k">auto</span> <span class="o">&amp;</span><span class="n">entries</span> <span class="o">=</span> <span class="n">StructVector</span><span class="o">::</span><span class="n">GetEntries</span><span class="p">(</span><span class="n">result</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">solution</span><span class="p">.</span><span class="n">exists</span><span class="p">)</span> <span class="p">{</span>
      <span class="n">FlatVector</span><span class="o">::</span><span class="n">SetNull</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span>
    <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
      <span class="n">FlatVector</span><span class="o">::</span><span class="n">GetData</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">(</span><span class="o">*</span><span class="n">entries</span><span class="p">[</span><span class="mi">0</span><span class="p">])[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">solution</span><span class="p">.</span><span class="n">x1</span><span class="p">;</span>
      <span class="n">FlatVector</span><span class="o">::</span><span class="n">GetData</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">(</span><span class="o">*</span><span class="n">entries</span><span class="p">[</span><span class="mi">1</span><span class="p">])[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">solution</span><span class="p">.</span><span class="n">x2</span><span class="p">;</span>
    <span class="p">}</span>
  <span class="p">}</span>
<span class="p">};</span>

<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">SolveQuadraticEquation2ScalarFunc</span><span class="p">(</span><span class="n">DataChunk</span> <span class="o">&amp;</span><span class="n">args</span><span class="p">,</span>
                                              <span class="n">ExpressionState</span> <span class="o">&amp;</span><span class="n">state</span><span class="p">,</span>
                                              <span class="n">Vector</span> <span class="o">&amp;</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">a_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">b_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">c_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>

  <span class="n">GenericExecutor</span><span class="o">::</span><span class="n">ExecuteTernary</span><span class="o">&lt;</span>
      <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">,</span>
      <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="p">,</span> <span class="n">QuadraticEquationSolution</span><span class="o">&gt;</span><span class="p">(</span>
      <span class="n">a_vector</span><span class="p">,</span> <span class="n">b_vector</span><span class="p">,</span> <span class="n">c_vector</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">args</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span>
      <span class="p">[</span><span class="o">&amp;</span><span class="p">](</span><span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">a</span><span class="p">,</span> <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">b</span><span class="p">,</span>
          <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span> <span class="n">c</span><span class="p">)</span> <span class="p">{</span>
        <span class="k">auto</span> <span class="n">discriminant</span> <span class="o">=</span> <span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">*</span> <span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">-</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">a</span><span class="p">.</span><span class="n">val</span> <span class="o">*</span> <span class="n">c</span><span class="p">.</span><span class="n">val</span><span class="p">;</span>
        <span class="n">QuadraticEquationSolution</span> <span class="n">solution</span><span class="p">;</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">discriminant</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
          <span class="n">solution</span><span class="p">.</span><span class="n">exists</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
          <span class="n">solution</span><span class="p">.</span><span class="n">x1</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">+</span> <span class="n">sqrt</span><span class="p">(</span><span class="n">discriminant</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">a</span><span class="p">.</span><span class="n">val</span><span class="p">);</span>
          <span class="n">solution</span><span class="p">.</span><span class="n">x2</span> <span class="o">=</span> <span class="p">(</span><span class="o">-</span><span class="n">b</span><span class="p">.</span><span class="n">val</span> <span class="o">-</span> <span class="n">sqrt</span><span class="p">(</span><span class="n">discriminant</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="n">a</span><span class="p">.</span><span class="n">val</span><span class="p">);</span>
        <span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
          <span class="n">solution</span><span class="p">.</span><span class="n">exists</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">return</span> <span class="n">solution</span><span class="p">;</span>
      <span class="p">});</span>
<span class="p">}</span>

<span class="c1">// ...</span>

<span class="k">static</span> <span class="kt">void</span> <span class="nf">LoadInternal</span><span class="p">(</span><span class="n">DatabaseInstance</span> <span class="o">&amp;</span><span class="n">instance</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// ...</span>

  <span class="k">auto</span> <span class="n">solve_quadratic_equation_scalar_function2</span> <span class="o">=</span> <span class="n">ScalarFunction</span><span class="p">(</span>
      <span class="s">"solve_quadratic_equation2"</span><span class="p">,</span>
      <span class="p">{</span><span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">,</span> <span class="n">LogicalType</span><span class="o">::</span><span class="n">DOUBLE</span><span class="p">},</span>
      <span class="n">LogicalType</span><span class="o">::</span><span class="n">STRUCT</span><span class="p">(</span><span class="n">quadratic_equation_solution_child_types</span><span class="p">),</span>
      <span class="n">SolveQuadraticEquation2ScalarFunc</span><span class="p">);</span>
  <span class="n">ExtensionUtil</span><span class="o">::</span><span class="n">RegisterFunction</span><span class="p">(</span><span class="n">instance</span><span class="p">,</span>
                                  <span class="n">solve_quadratic_equation_scalar_function2</span><span class="p">);</span>

  <span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div></div>

<h3 id="taking-a-struct-as-input-argument">Taking a STRUCT as input argument</h3>

<p><code class="language-plaintext highlighter-rouge">StructTypeBinary</code> (and the other <code class="language-plaintext highlighter-rouge">StructType...</code> types) cannot only be used as result type, but also as input type. Let’s implement a <code class="language-plaintext highlighter-rouge">quadratic_equation_from_solution(STRUCT(x1 DOUBLE, x2 DOUBLE)) -&gt; VARCHAR</code> function that outputs a quadratic equation for which <code class="language-plaintext highlighter-rouge">x1</code> and <code class="language-plaintext highlighter-rouge">x2</code> are solutions:</p>

<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">inline</span> <span class="kt">void</span> <span class="nf">QuadraticEquationFromSolutionScalarFunc</span><span class="p">(</span><span class="n">DataChunk</span> <span class="o">&amp;</span><span class="n">args</span><span class="p">,</span>
                                                    <span class="n">ExpressionState</span> <span class="o">&amp;</span><span class="n">state</span><span class="p">,</span>
                                                    <span class="n">Vector</span> <span class="o">&amp;</span><span class="n">result</span><span class="p">)</span> <span class="p">{</span>
  <span class="k">auto</span> <span class="o">&amp;</span><span class="n">solution_vector</span> <span class="o">=</span> <span class="n">args</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>

  <span class="n">GenericExecutor</span><span class="o">::</span><span class="n">ExecuteUnary</span><span class="o">&lt;</span><span class="n">StructTypeBinary</span><span class="o">&lt;</span><span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="o">&gt;</span><span class="p">,</span>
                                <span class="n">PrimitiveType</span><span class="o">&lt;</span><span class="n">string_t</span><span class="o">&gt;&gt;</span><span class="p">(</span>
      <span class="n">solution_vector</span><span class="p">,</span> <span class="n">result</span><span class="p">,</span> <span class="n">args</span><span class="p">.</span><span class="n">size</span><span class="p">(),</span>
      <span class="p">[</span><span class="o">&amp;</span><span class="p">](</span><span class="n">StructTypeBinary</span><span class="o">&lt;</span><span class="kt">double</span><span class="p">,</span> <span class="kt">double</span><span class="o">&gt;</span> <span class="n">solution</span><span class="p">)</span> <span class="p">{</span>
        <span class="kt">double</span> <span class="n">x1</span> <span class="o">=</span> <span class="n">solution</span><span class="p">.</span><span class="n">a_val</span><span class="p">;</span>
        <span class="kt">double</span> <span class="n">x2</span> <span class="o">=</span> <span class="n">solution</span><span class="p">.</span><span class="n">b_val</span><span class="p">;</span>

        <span class="kt">double</span> <span class="n">b</span> <span class="o">=</span> <span class="o">-</span><span class="n">x1</span> <span class="o">-</span> <span class="n">x2</span><span class="p">;</span>
        <span class="kt">double</span> <span class="n">c</span> <span class="o">=</span> <span class="n">x1</span> <span class="o">*</span> <span class="n">x2</span><span class="p">;</span>
        <span class="k">return</span> <span class="n">StringVector</span><span class="o">::</span><span class="n">AddString</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="s">"x^2 + "</span> <span class="o">+</span> <span class="n">to_string</span><span class="p">(</span><span class="n">b</span><span class="p">)</span> <span class="o">+</span>
                                                   <span class="s">"x + "</span> <span class="o">+</span> <span class="n">to_string</span><span class="p">(</span><span class="n">c</span><span class="p">));</span>
      <span class="p">});</span>
<span class="p">}</span>

<span class="c1">// ...</span>

<span class="k">static</span> <span class="kt">void</span> <span class="nf">LoadInternal</span><span class="p">(</span><span class="n">DatabaseInstance</span> <span class="o">&amp;</span><span class="n">instance</span><span class="p">)</span> <span class="p">{</span>
  <span class="c1">// ...</span>

  <span class="k">auto</span> <span class="n">quadratic_equation_from_solution_scalar_function</span> <span class="o">=</span> <span class="n">ScalarFunction</span><span class="p">(</span>
      <span class="s">"quadratic_equation_from_solution"</span><span class="p">,</span>
      <span class="p">{</span><span class="n">LogicalType</span><span class="o">::</span><span class="n">STRUCT</span><span class="p">(</span><span class="n">quadratic_equation_solution_child_types</span><span class="p">)},</span>
      <span class="n">LogicalType</span><span class="o">::</span><span class="n">VARCHAR</span><span class="p">,</span> <span class="n">QuadraticEquationFromSolutionScalarFunc</span><span class="p">);</span>
  <span class="n">ExtensionUtil</span><span class="o">::</span><span class="n">RegisterFunction</span><span class="p">(</span>
      <span class="n">instance</span><span class="p">,</span> <span class="n">quadratic_equation_from_solution_scalar_function</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D select quadratic_equation_from_solution({ 'x1': 4, 'x2': 3});
┌──────────────────────────────────────────────────────────────────────┐
│ quadratic_equation_from_solution(main.struct_pack(x1 := 4, x2 := 3)) │
│                               varchar                                │
├──────────────────────────────────────────────────────────────────────┤
│ x^2 + -7.000000x + 12.000000                                         │
└──────────────────────────────────────────────────────────────────────┘
</code></pre></div></div>

<p>Any type that has a static <code class="language-plaintext highlighter-rouge">ConstructType</code> method (see for example the <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/vector_operations/generic_executor.hpp#L99"><code class="language-plaintext highlighter-rouge">StructTypeBinary</code> implementation</a>) can be used to represent an input type (which, as you can see, is more complicated than implementing <code class="language-plaintext highlighter-rouge">AssignResult</code>). Also note that while <code class="language-plaintext highlighter-rouge">GenericListType</code> implements <code class="language-plaintext highlighter-rouge">AssignResult</code>, it currently does <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/include/duckdb/common/vector_operations/generic_executor.hpp#L222"><em>not</em> implement <code class="language-plaintext highlighter-rouge">ConstructType</code></a>.</p>

<h2 id="where-to-find-more-examples">Where to find more examples</h2>

<p>That’s it for now! This post still only scratches the surface of scalar functions in DuckDB – as demonstrated by the <a href="https://github.com/duckdb/duckdb/blob/5f5512b827df6397afd31daedb4bbdee76520019/src/function/scalar_function.cpp#L11">extensive constructor of <code class="language-plaintext highlighter-rouge">ScalarFunction</code></a>. Nevertheless, this should be enough to get you started with writing your own extensions with scalar functions! All examples in this post are <a href="https://github.com/thomas-daniels/duckdb-extension-scalar-function-examples">bundled in a GitHub repository</a>. You can find more examples in these places:</p>

<ul>
  <li>The DuckDB source code (specifically <a href="https://github.com/duckdb/duckdb/tree/5f5512b827df6397afd31daedb4bbdee76520019/extension/core_functions/scalar">extension/core_functions/scalar</a> and <a href="https://github.com/duckdb/duckdb/tree/5f5512b827df6397afd31daedb4bbdee76520019/src/function/scalar">src/function/scalar</a>) contains the implementation of DuckDB’s built-in functions.</li>
  <li><a href="https://github.com/duckdb/duckdb-spatial">The <code class="language-plaintext highlighter-rouge">spatial</code> extension</a> implements many scalar functions.</li>
  <li>A bit beyond the scope of scalar functions, but <a href="https://github.com/Maxxen/duckdb_ulid">the <code class="language-plaintext highlighter-rouge">ulid</code> extension</a> is a great reference extension if you want to see how you can add a custom datatype and associated functions.</li>
</ul>

<p>If you have any feedback, feel free to open an issue on the <a href="https://github.com/thomas-daniels/duckdb-extension-scalar-function-examples">GitHub repository</a> or contact me <a href="https://bsky.app/profile/thomasd.be">on Bluesky</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[The analytical database system DuckDB offers an extension mechanism allowing you to add your own functionality to the system. The C++ extension template provides a starting point for extensions and contains two example scalar functions. This blog post goes into more detail on how to write such functions.]]></summary></entry></feed>