Frontend Masters Boost RSS Feed https://frontendmasters.com/blog Helping Your Journey to Senior Developer Mon, 10 Mar 2025 22:33:36 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 225069128 The Moment You Need a Database https://frontendmasters.com/blog/the-moment-you-need-a-database/ Mon, 10 Mar 2025 22:33:35 +0000 https://frontendmasters.com/blog/?p=5273 Not all websites need a database.

I was just drooling over the new Mac Studio landing page on apple.com. It’s a beautiful piece of web design that serves an important purpose for a big company, and really doesn’t need any technology other than HTML, CSS, and JavaScript. “Brochure” sites like this are classic examples of sites that don’t need a database. You’ll sometimes hear them called “awards” sites, as they tend to be beautifully focused one-offs that don’t need to live forever within a long-lasting system.

Even sites with a lot more pages don’t necessarily need a database. You can just duplicate directories or HTML files for a while, even if it is a little repetitive. Modern code editors make updating repetitive code in a code base no sweat.

Doing this kind of thing across almost any number of files is no problem.

At some point, sites with a ton of pages benefit from a more robust templating system. Blog posts use this template, landing pages use this template, etc.

Eleventy is great at combining layouts and content.

Combining content with templates to produce a page can be done a number of ways and entirely without a database.

  • A website with access to a backend language can use that backend language to do the content+template creation on request.
  • A website builder tool with a build process can produce static HTML output from content+template before the files are deployed to the server.

So yeah! You can get pretty far without needing a database at all. Perhaps the most common approach is using a static site generator, of which there are many. Frontend Masters has courses on site builders that can produce static sites like Next.js, Nuxt, and Astro.

So then when? When do you make the call that for some particular product, a database is the right decision.

When You Probably Need a Database

  • When you have a log in system. You might be able to get away with not storing user auth information at all (e.g. “social auth” only), but even then, those users are likely doing something with your website and if you need to save what they are doing, you need to associate it with that user, and a database will be the way. If you want to let a user log in and do things, perhaps even across devices, you’ll be storing what they are doing and like their last “state” of what they are doing in a database.
  • When you have very dynamic data. “Dynamic” meaning changing often. Imagine forums. Users logging in and creating new threads and responding to existing threads. It’s probably not practical to be forcing some build process to run rebuilding static files in a forum. Forum data makes much more sense in a database, especially as it can then be retrieved and used in many different ways (recent posts, individual threads, user profiles, etc.) Comment sections are similar and something that a database is generally in charge of.
  • When you have relational data. Imagine a real estate app where the users are realtors, realtors represent homes, but a home might have multiple realtors, and those realtors belong to realtor groups, etc. These are all tables of data that can reference each other and is the strength (and point) of relational databases like Postgres. (see a Complete Intro to SQL & PostgreSQL)
  • When you present things that need to be search, filtered, or sorted. A database excels at jobs like “give me these specific things sorted in descending order by date” or the like. Producing static files of every possible search, sort, or filtering choice (assuming you can’t do that filtering client side) isn’t usually practical.
  • When you just have too much. Just to draw a reasonable line, I’d say at over 100 pages, unless they are super similar types like only blog posts, it’s likely a site benefits from keeping the content for those pages in a database. It might not be “pages” only, it could be users, products, songs, or whatever the main things of the website are.
The forum software Discourse uses Postgres databases.

What are the downsides of a database then? What does it cost you?

It seems like there are a lot of upsides to a database, doesn’t there? It really does a lot for a website and is all but a requirement for bigger and more feature rich websites. What’s so nice about not having one, such that there are specific tools to avoid it?

  • Databases are more expensive. Hosting that does not include a database tends to be cheaper. Netlify, a classic host built upon a foundation of hosting only static files, has an entirely free plan. A database host might be an entirely separate hosted service which may come with it’s own costs.
  • Databases require security. Databases are usually necessarily accessible on the open web or at least on the private network that the website backend has access to. They also tend to contain sensitive data like private user information. If a nefarious user got more access to the database than they should have, they could download that private information and/or corrupt what is there. This is unfortunately not a maybe. Insecure databases will almost certainly be messed with.
  • Databases require backups. Because of the risk of nefarious behavior, and even more likely, your own team making honest mistakes, you need to have backups of the database as often as possible. When dealing with flat files, your Git repository and previous deployments act as your backups, but that doesn’t cover a database. Databases need their own unique backup systems.
  • Databases require maintenance. Database software can update versions, which can include security patches. Add this to your list of responsibilities.
  • Databases require a backend language and ORM. Front-end code doesn’t typically talk directly to a database. It talks to back-end code which talks to a database. And even back-end code typically isn’t raw SQL queries and the like, it uses an ORM (e.g. Prisma) that provide a more friendly abstracted syntax for dealing with data. When you work with a database, ends up being lots of additional technology that layers onto your stack to make it all work together.
  • Databases likely need a migration strategy. Just having and using a database is work, but the structure of a database has a habit of needing to change. Imagine needing to add an additional column to the database, or change the name or structure of the data within a table. These changes are called migrations and have their own set of complexity that might become your job.
Databases don’t come alone. You need a backend language to connect to it and likely an ORM like Prisma to work with it effectively.

Implied Databases

This isn’t to say that all of this is immediately your problem when using a site that makes use of a database. For instance, this is a WordPress websites as I write, which has an SQL database. But we didn’t have to write APIs for it and we don’t deal with migrations for it, and the backups. Most WordPress websites I’ve worked with have implied security through the hosting and backups handled by paid plugins.

So using pre-existing software that makes use of a database is a different situation than an entirely custom project that grows into needing a custom database.

Types of Databases

It should also be noted that databases come in lots of different shapes and sizes.

  • A K:V store (key: value) is perhaps the simplest possible database. This is the foundation of software like Redis and you’ll see companies like Cloudflare offer stores like this.
  • Some databases are “schema-less” in that the bits of data have no defined types, and are more JSON-like in nature like CouchDB, MongoDB, Firebase, etc.
  • Some databases come in the form of hosted services and are content-focused like Sanity and Contentful.
  • Even “classic” databases like SQL come in varietals like Postgres and MariaDB. And these have dedicated hosts like SQL has PlanetScale and Cockroach and Postgres has Supabase.

Some websites do need a database! Good luck!

]]>
5273
CRDTs and Y.js https://frontendmasters.com/blog/crdts-and-y-js/ https://frontendmasters.com/blog/crdts-and-y-js/#respond Thu, 06 Feb 2025 18:56:26 +0000 https://frontendmasters.com/blog/?p=5149 Realtime applications (or games!), where multiple users are working at once and the state of the app needs to be in sync with each other and persisted, is more and more a common need and expected feature. But it’s not simple. There are products to help (PouchDB for data and Ably for messaging come to mind), but sometimes you need to be handling the core technologies yourself. Often the core technology of syncing is CRDTs (Conflict Free Replicated Data) which help data become “eventually consistent” between sources. There is a nice library that helps with this, still at a pretty low level: Yjs. I say all this to point out two nice resources I saw recently:

  1. A Gentle Introduction to CRDTs
  2. Learn Yjs

I suspect there will be more and more work in this area over time.

]]>
https://frontendmasters.com/blog/crdts-and-y-js/feed/ 0 5149
1 dataset. 100 visualizations. https://frontendmasters.com/blog/1-dataset-100-visualizations/ https://frontendmasters.com/blog/1-dataset-100-visualizations/#respond Thu, 19 Dec 2024 20:15:41 +0000 https://frontendmasters.com/blog/?p=4860 Imagine this simple data set:

NorwayDenmarkSweden
20045413
200281015

Pretty simple, but there is interesting stuff going on. Someone might be trying to reference an individual bit of information, but they also might be looking to compare data, or look at the rate of change of the data over time, all of which are things this data has.

The agency Ferdio created 100 visualizations of that data. Why so many? They can be used to emphasize different parts of the story being told with the data (and some are just… bad.)

]]>
https://frontendmasters.com/blog/1-dataset-100-visualizations/feed/ 0 4860
Introducing TanStack Start https://frontendmasters.com/blog/introducing-tanstack-start/ https://frontendmasters.com/blog/introducing-tanstack-start/#comments Wed, 18 Dec 2024 17:43:51 +0000 https://frontendmasters.com/blog/?p=4810 The best way to think about TanStack Start is that it’s a thin server layer atop the TanStack Router we already know and love; that means we don’t lose a single thing from TanStack Router. Not only that, but the nature of this server layer allows it to side-step the pain points other web meta-frameworks suffer from.

This is a post I’ve been looking forward to writing for a long time; it’s also a difficult one to write.

The goal (and challenge) will be to show why a server layer on top of a JavaScript router is valuable, and why TanStack Start’s implementation is unique compared to the alternatives (in a good way). From there, showing how TanStack Start actually works will be relatively straightforward. Let’s go!

Please keep in mind that, while this post discusses a lot of generic web performance issues, TanStack Start is still a React-specific meta-framework. It’s not a framework-agnostic tool like Astro

Why Server Rendering?

Client-rendered web applications, often called “Single Page Applications” or “SPAs” have been popular for a long time. With this type of app, the server sends down a mostly empty HTML page, possibly with some sort of splash image, loading spinner, or maybe some navigation components. It also includes, very importantly, script tags that load your framework of choice (React, Vue, Svelte, etc) and a bundle of your application code.

These apps were always fun to build, and in spite of the hate they often get, they (usually) worked just fine (any kind of software can be bad). Admittedly, they suffer a big disadvantage: initial render performance. Remember, the initial render of the page was just an empty shell of your app. This displayed while your script files loaded and executed, and once those scripts were run, your application code would most likely need to request data before your actual app could display. Under the covers, your app is doing something like this

The initial render of the page, from the web server, renders only an empty shell of your application. Then some scripts are requested, and then parsed and executed. When those application scripts run, you (likely) send some other requests for data. Once that is done, your page displays.

To put it more succinctly, with client-rendered web apps, when the user first loads your app, they’ll just get a loading spinner. Maybe your company’s logo above it, if they’re lucky.

This is perhaps an overstatement. Users may not even notice the delay caused by these scripts loading (which are likely cached), or hydration, which is probably fast. Depending on the speed of their network, and the type of application, this stuff might not matter much.

Maybe.

But if our tools now make it easy to do better, why not do better?

Server Side Rendering

With SSR, the picture looks more like this

The server sends down the complete, finished page that the user can see immediately. We do still need to load our scripts and hydrate, so our page can be interactive. But that’s usually fast, and the user will still have content to see while that happens.

Our hypothetical user now looks like this, since the server is responding with a full page the user can see.

Streaming

We made one implicit assumption above: that our data was fast. If our data was slow to load, our server would be slow to respond. It’s bad for the user to be stuck looking at a loading spinner, but it’s even worse for the user to be stuck looking at a blank screen while the server churns.

As a solution for this, we can use something called “streaming,” or “out of order streaming” to be more precise. The user still requests all the data, as before, but we tell our server “don’t wait for this/these data, which are slow: render everything else, now, and send that slow data to the browser when it’s ready.”

All modern meta-frameworks support this, and our picture now looks like this

To put a finer point on it, the server does still initiate the request for our slow data immediately, on the server during our initial navigation. It just doesn’t block the initial render, and instead pushes down the data when ready. We’ll look at streaming with Start later in this post.

Why did we ever do client-rendering?

I’m not here to tear down client-rendered apps. They were, and frankly still are an incredible way to ship deeply interactive user experiences with JavaScript frameworks like React and Vue. The fact of the matter is, server rendering a web app built with React was tricky to get right. You not only needed to server render and send down the HTML for the page the user requested, but also send down the data for that page, and hydrate everything just right on the client.

It’s hard to get right. But here’s the thing: getting this right is the one of the primary purposes of this new generation of meta-frameworks. Next, Nuxt, Remix, SvelteKit, and SolidStart are some of the more famous examples of these meta-frameworks. And now TanStack Start.

Why is TanStack Start different?

Why do we need a new meta-framework? There’s many possible answers to that question, but I’ll give mine. Existing meta-frameworks suffer from some variation on the same issue. They’ll provide some mechanism to load data on the server. This mechanism is often called a “loader,” or in the case of Next, it’s just RSCs (React Server Components). In Next’s (older) pages directory, it’s the getServerSideProps function. The specifics don’t matter. What matters is, for each route, whether the initial load of the page, or client-side navigation via links, some server-side code will run, send down the data, and then render the new page.

Need to bone up on React in general? Brian Holt’s Complete Intro to React and Intermediate React will get you there.

An Impedance Mismatch is Born

Notice the two worlds that exist: the server, where data loading code will always run, and the client. It’s the difference and separation between these worlds that can cause issues.

For example, frameworks always provide some mechanism to mutate data, and then re-fetch to show updated state. Imagine your loader for a page loads some tasks, user settings, and announcements. When the user edits a task, and revalidates, these frameworks will almost always re-run the entire loader, and superfluously re-load the user’s announcements and user settings, in addition to tasks, even though tasks are the only thing that changed.

Are there fixes? Of course. Many frameworks will allow you to create extra loaders to spread the data loading across, and revalidate only some of them. Other frameworks encourage you to cache your data. These solutions all work, but come with their own tradeoffs. And remember, they’re solutions to a problem that meta-frameworks created, by having server-side loading code for every path in your app.

Or what about a loader that loads 5 different pieces of data? After the page loads, the user starts browsing around, occasionally coming back to that first page. These frameworks will usually cache that previously-displayed page, for a time. Or not. But it’s all or none. When the loader re-runs, all 5 pieces of data will re-fire, even if 4 of them can be cached safely.

You might think using a component-level data loading solution like react-query can help. react-query is great, but it doesn’t eliminate these problems. If you have two different pages that each have 5 data sources, of which 4 are shared in common, browsing from the first page to the second will cause the second page to re-request all 5 pieces of data, even though 4 of them are already present in client-side state from the first page. The server is unaware of what happens to exist on the client. The server is not keeping track of what state you have in your browser; in fact the “server” might just be a Lambda function that spins up, satisfies your request, and then dies off.

In the picture, we can see a loader from the server sending down data for queryB, which we already have in our TanStack cache.

Where to, from here?

The root problem is that these meta-frameworks inevitably have server-only code running on each path, integrating with long-running client-side state. This leads to conflicts and inefficiencies which need to be managed. There’s ways of handling these things, which I touched on above. But it’s not a completely clean fit.

How much does it matter?

Let’s be clear right away: if this situation is killing performance of your site, you have bigger problems. If these extra calls are putting undue strain on your services, you have bigger problems.

That said, one of the first rules of distributed systems is to never trust your network. The more of these calls we’re firing off, the better the chances that some of them might randomly be slow for some reason beyond our control. Or fail.

We typically tolerate requesting more than we need in these scenarios because it’s hard to avoid with our current tooling. But I’m here to show you some new, better tooling that side-steps these issues altogether.

Isomorphic Loaders

In TanStack, we do have loaders. These are defined by TanStack Router. I wrote a three-part series on Router here. If you haven’t read that, and aren’t familiar with Router, give it a quick look.

Start takes what we already have with Router, and adds server handling to it. On the initial load, your loader will run on the server, load your data, and send it down. On all subsequent client-side navigations, your loader will run on the client, like it already does. That means all subsequent invocations of your loader will run on the client, and have access to any client-side state, cache, etc. If you like react-query, you’ll be happy to know that’s integrated too. Your react-query client can run on the server, to load, and send data down on the initial page load. On subsequent navigations, these loaders will run on the client, which means your react-query queryClient will have full access to the usual client-side cache react-query always uses. That means it will know what does, and does not need to be loaded.

It’s honestly such a refreshing, simple, and most importantly, effective pattern that it’s hard not being annoyed none of the other frameworks thought of it first. Admittedly, SvelteKit does have universal loaders which are isomorphic in the same way, but without a component-level query library like react-query integrated with the server.

TanStack Start

Enough setup, let’s look at some code. TanStack Start is still in beta, so some of the setup is still a bit manual, for now.

The repo for this post is here.

If you’d like to set something up yourself, check out the getting started guide. If you’d like to use react-query, be sure to add the library for that. You can see an example here. Depending on when you read this, there might be a CLI to do all of this for you.

This post will continue to use the same code I used in my prior posts on TanStack Router. I set up a new Start project, copied over all the route code, and tweaked a few import paths since the default Start project has a slightly different folder structure. I also removed all of the artificial delays, unless otherwise noted. I want our data to be fast by default, and slow in a few places where we’ll use streaming to manage the slowness.

We’re not building anything new, here. We’re taking existing code, and moving the data loading up to the server in order to get it requested sooner, and improve our page load times. This means everything we already know and love about TanStack Router is still 100% valid.

Start does not replace Router; Start improves Router.

Loading Data

All of the routes and loaders we set up with Router are still valid. Start sits on top of Router and adds server processing. Our loaders will execute on the server for the first load of the page, and then on the client as the user browses. But there’s a small problem. While the server environment these loaders will execute in does indeed have a fetch function, there are differences between client-side fetch, and server-side fetch—for example, cookies, and fetching to relative paths.

To solve this, Start lets you define a server function. Server functions can be called from the client, or from the server; but the server function itself always executes on the server. You can define a server function in the same file as your route, or in a separate file; if you do the former, TanStack will do the work of ensuring that server-only code does not ever exist in your client bundle.

Let’s define a server function to load our tasks, and then call it from the tasks loader.

import { getCookie } from "vinxi/http";
import { createServerFn } from "@tanstack/start";
import { Task } from "../../types";

export const getTasksList = createServerFn({ method: "GET" }).handler(async () => {
  const result = getCookie("user");

  return fetch(`http://localhost:3000/api/tasks`, { method: "GET", headers: { Cookie: "user=" + result } })
    .then(resp => resp.json())
    .then(res => res as Task[]);
});

We have access to a getCookie utility from the vinxi library on which Start is built. Server functions actually provide a lot more functionality than this simple example shows. Be sure to check out the docs to learn more.

If you’re curious about this fetch call:

fetch(`http://localhost:3000/api/tasks`, { method: "GET", headers: { Cookie: "user=" + result } });

That’s how I’m loading data for this project, on the server. I have a separate project running a set of Express endpoints querying a simple SQLite database. You can fetch your data however you need from within these server functions, be it via an ORM like Drizzle, an external service endpoint like I have here, or you could connect right to a database and query what you need. But that latter option should probably be discouraged for production applications.

Now we can call our server function from our loader.

loader: async ({ context }) => {
    const now = +new Date();
    console.log(`/tasks/index path loader. Loading tasks at + ${now - context.timestarted}ms since start`);
    const tasks = await getTasksList();
    return { tasks };
  },

That’s all there is to it. It’s almost anti-climactic. The page loads, as it did in the last post. Except now it server renders. You can shut JavaScript off, and the page will still load and display (and hyperlinks will still work).

Streaming

Let’s make the individual task loading purposefully slow (we’ll just keep the delay that was already in there), so we can see how to stream it in. Here’s our server function to load a single task.

export const getTask = createServerFn({ method: "GET" })
  .validator((id: string) => id)
  .handler(async ({ data }) => {
    return fetch(`http://localhost:3000/api/tasks/${data}`, { method: "GET" })
      .then(resp => resp.json())
      .then(res => res as Task);
  });

Note the validator function, which is how we strongly type our server function (and validate the inputs). But otherwise it’s more of the same.

Now let’s call it in our loader, and see about enabling streaming

Here’s our loader:

loader: async ({ params, context }) => {
    const { taskId } = params;

    const now = +new Date();
    console.log(`/tasks/${taskId} path loader. Loading at + ${now - context.timestarted}ms since start`);
    const task = getTask({ data: taskId });

    return { task };
  },

Did you catch it? We called getTask without awaiting it. That means task is a promise, which Start and Router allow us to return from our loader (you could name it taskPromise if you like that specificity in naming).

But how do we consume this promise, show loading state, and await the real value? There are two ways. TanStack Router defines an Await component for this. But if you’re using React 19, you can use the new use psuedo-hook.

import { use } from "react";

function TaskView() {
  const { task: taskPromise } = Route.useLoaderData();
  const { isFetching } = Route.useMatch();

  const task = use(taskPromise);

  return (
    <div>
      <Link to="/app/tasks">Back to tasks list</Link>
      <div className="flex flex-col gap-2">
        <div>
          Task {task.id} {isFetching ? "Loading ..." : null}
        </div>
        <h1>{task.title}</h1>
        <Link 
          params={{ taskId: task.id }}
          to="/app/tasks/$taskId/edit"
        >
          Edit
        </Link>
        <div />
      </div>
    </div>
  );
}

The use hook will cause the component to suspend, and show the nearest Suspense boundary in the tree. Fortunately, the pendingComponent you set up in Router also doubles as a Suspense boundary. TanStack is impressively well integrated with modern React features.

Now when we load an individual task’s page, we’ll first see the overview data which loaded quickly, and server rendered, above the Suspense boundary for the task data we’re streaming

When the task comes in, the promise will resolve, the server will push the data down, and our use call will provide data for our component.

React Query

As before, let’s integrate react-query. And, as before, there’s not much to do. Since we added the @tanstack/react-router-with-query package when we got started, our queryClient will be available on the server, and will sync up with the queryClient on the client, and put data (or in-flight streamed promises) into cache.

Let’s start with our main epics page. Our loader looked like this before:

async loader({ context, deps }) {
    const queryClient = context.queryClient;

    queryClient.ensureQueryData(
      epicsQueryOptions(context.timestarted, deps.page)
    );
    queryClient.ensureQueryData(
      epicsCountQueryOptions(context.timestarted)
    );
  }

That would kick off the requests on the server, but let the page render, and then suspend in the component that called useSuspenseQuery—what we’ve been calling streaming.

Let’s change it to actually load our data in our loader, and server render the page instead. The change couldn’t be simpler.

async loader({ context, deps }) {
  const queryClient = context.queryClient;

  await Promise.allSettled([
    queryClient.ensureQueryData(
      epicsQueryOptions(context.timestarted, deps.page)
    ),
    queryClient.ensureQueryData(
      epicsCountQueryOptions(context.timestarted)
    ),
  ]);
},

Note we’re awaiting a Promise.allSettled call here so the queries can run together. Make sure you don’t sequentially await each individual call, as that would create a waterfall, or use Promise.all, as that will quit immediately if any of the promises error out.

Streaming with react-query

As I implied above, to stream data with react-query, do the exact same thing, but don’t await the promise. Let’s do that on the page for viewing an individual epic.

loader: ({ context, params }) => {
  const { queryClient, timestarted } = context;

  queryClient.ensureQueryData(
    epicQueryOptions(timestarted, params.epicId)
  );
},

Now if this page is loaded initially, the query for this data will start on the server and stream to the client. If the data are pending, our suspense boundary will show, triggered automatically by react-query’s useSuspenseBoundary hook.

If the user browses to this page from a different page, the loader will instead run on the client, but still fetch those same data from the same server function, and trigger the same suspense boundary.

Parting Thoughts

I hope this post was useful to you. It wasn’t a deep dive into TanStack Start — the docs are a better venue for that. Instead, I hope I was able to show why server rendering can offer almost any web app a performance boost, and why TanStack Start is a superb tool for doing so. Not only does it simplify a great deal of things by running loaders isomorphically, but it even integrates wonderfully with react-query.

The react-query integration is especially exciting to me. It delivers component-level data fetching while still allowing for server fetching, and streaming—all without sacrificing one bit of convenience.

]]>
https://frontendmasters.com/blog/introducing-tanstack-start/feed/ 4 4810
Drizzle Database Migrations https://frontendmasters.com/blog/drizzle-database-migrations/ https://frontendmasters.com/blog/drizzle-database-migrations/#respond Mon, 09 Dec 2024 15:23:12 +0000 https://frontendmasters.com/blog/?p=4692 Drizzle ORM is an incredibly impressive object-relational mapper (ORM). Like traditional ORMs, it offers a domain-specific language (DSL) for querying entire object graphs. Imagine grabbing some “tasks”, along with “comments” on those tasks from your database. But unlike traditional ORMs, it also exposes SQL itself via a thin, strongly typed API. This allows you to write complex queries using things like MERGEUNION, CTEs, and so on, but in a strongly typed API that looks incredibly similar to the SQL you already know (and hopefully love).

I wrote about Drizzle previously. That post focused exclusively on the typed SQL API. This post will look at another drizzle feature: database migrations. Not only will Drizzle allow you to query your database via a strongly typed API, but it will also keep your object model and database in sync. Let’s get started!

Our Database

Drizzle supports Postgres, MySQL, and SQLite. For this post we’ll be using Postgres, but the idea is the same for all of them. If you’d like to follow along at home, I urge you to use Docker to spin up a Postgres database (or MySQL, if that’s your preference). If you’re completely new to Docker, it’s not terribly hard to get it installed. Once you have it installed, run this:

docker container run -e POSTGRES_USER=docker -e POSTGRES_PASSWORD=docker -p 5432:5432 postgres:17.2-alpine3.20

That should get you a Postgres instance up and running that you can connect to on localhost, with a username and password of docker / docker. When you stop that process, your database will vanish into the ether. Restarting that same process will create a brand new Postgres instance with a completely clean slate, making this especially convenient for the type of testing we’re about to do: database migrations.

Incidentally, if you’d like to run a database that actually persists its data on your machine, you can mount a volume.

docker container run -e POSTGRES_USER=docker -e POSTGRES_PASSWORD=docker -p 5432:5432 -v /Users/arackis/Documents/pg-data:/var/lib/postgresql/data postgres:17.2-alpine3.20

That does the same thing, while telling Docker to alias the directory in its image of /var/lib/postgresql/data (where Postgres stores its data) onto the directory on your laptop at /Users/arackis/Documents/pg-data. Adjust the latter path as desired. (The other path isn’t up for debate, as that’s what Postgres uses.)

Setting Up

We’ll get an empty app up (npm init is all we need), and then install a few things.

npm i drizzle-orm drizzle-kit pg

The drizzle-orm package is the main ORM that handles querying your database. The drizzle-kit package is what handles database migrations, which will be particularly relevant for this post. Lastly, the pg package is the Node Postgres drivers.

Configuring Drizzle

Let’s start by adding a drizzle.config.ts to the root of our project.

import { defineConfig } from "drizzle-kit";

export default defineConfig({
  dialect: "postgresql",
  out: "./drizzle-schema",
  dbCredentials: {
    database: "jira",
    host: "localhost",
    port: 5432,
    user: "docker",
    password: "docker",
    ssl: false,
  },
});

We tell Drizzle what kind of database we’re using (Postgres), where to put the generated schema code (the drizzle-schema folder), and then the database connection info.

Wanna see more real-world use cases of Drizzle in action? Check out Scott Moss’ course Intermediate Next.js which uses it and gets into it when the project gets into data fetching needs.

The Database First Approach

Say we already have a database and want to generate a Drizzle schema from it. (If you want to go in the opposite direction, stay tuned.)

To create our initial database, I’ve put together a script, which I’ll put here in its entirety.

CREATE DATABASE jira;

\c jira

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    username VARCHAR(50),
    name VARCHAR(250),
    avatar VARCHAR(500)
);

CREATE TABLE tasks (
    id SERIAL PRIMARY KEY,
    name VARCHAR(250),
    epic_id INT,
    user_id INT
);

CREATE TABLE epics (
    id SERIAL PRIMARY KEY,
    name VARCHAR(250),
    description TEXT,
    due DATE
);

CREATE TABLE tags (
    id SERIAL PRIMARY KEY,
    name VARCHAR(250)
);

CREATE TABLE tasks_tags (
    id SERIAL PRIMARY KEY,
    task INT,
    tag INT
);

ALTER TABLE tasks
    ADD CONSTRAINT fk_task_user
    FOREIGN KEY (user_id)
    REFERENCES users (id);

ALTER TABLE tasks
    ADD CONSTRAINT fk_task_epic
    FOREIGN KEY (epic_id)
    REFERENCES epics (id);

ALTER TABLE tasks_tags
    ADD CONSTRAINT fk_tasks_tags_tag
    FOREIGN KEY (tag)
    REFERENCES tags (id);

ALTER TABLE tasks_tags
    ADD CONSTRAINT fk_tasks_tags_task
    FOREIGN KEY (task)
    REFERENCES tasks (id);

This will construct a basic database for an hypothetical Jira clone. We have tables for users, epics, tasks and tags, along with various foreign keys connecting them. Assuming you have psql installed (can be installed via libpq), you can execute that script from the command line like this:

PGPASSWORD=docker psql -h localhost -p 5432 -U docker -f database-creation-script.sql

Now run this command:

npx drizzle-kit pull

This tells Drizzle to look at our database and generate a schema from it.

Drizzle pull

Files generated

Inside the drizzle-schema folder there’s now a schema.ts file with our Drizzle schema. Here’s a small sample of it.

import { pgTable, serial, varchar, foreignKey, integer } from "drizzle-orm/pg-core";
import { sql } from "drizzle-orm";

export const users = pgTable("users", {
  id: serial().primaryKey().notNull(),
  username: varchar({ length: 50 }),
  name: varchar({ length: 250 }),
  avatar: varchar({ length: 500 }),
});

export const tasks = pgTable(
  "tasks",
  {
    id: serial().primaryKey().notNull(),
    name: varchar({ length: 250 }),
    epicId: integer("epic_id"),
    userId: integer("user_id"),
  },
  table => {
    return {
      fkTaskUser: foreignKey({
        columns: [table.userId],
        foreignColumns: [users.id],
        name: "fk_task_user",
      }),
      fkTaskEpic: foreignKey({
        columns: [table.epicId],
        foreignColumns: [epics.id],
        name: "fk_task_epic",
      }),
    };
  }
);

The users entity is a table with some columns. The tasks entity is a bit more interesting. It’s also a table with some columns, but we can also see some foreign keys being defined.

In Postgres, foreign keys merely create a constraint that’s checked on inserts and updates to verify that a valid value is set, corresponding to a row in the target table. But it has no effect on application code, so you might wonder why Drizzle saw fit to bother creating it. Essentially, Drizzle will allow us to subsequently modify our schema in code, and generate an SQL file that will make equivalent changes in the database. For this to work, Drizzle needs to be aware of things like foreign keys, indexes, etc, so the schema in code, and the database are always truly in sync, and Drizzle knows what’s missing, and needs to be created.

Relations

The other file Drizzle created is relations.ts. Here’s a bit of it:

import { relations } from "drizzle-orm/relations";

export const tasksRelations = relations(tasks, ({ one, many }) => ({
  user: one(users, {
    fields: [tasks.userId],
    references: [users.id],
  }),
  epic: one(epics, {
    fields: [tasks.epicId],
    references: [epics.id],
  }),
  tasksTags: many(tasksTags),
}));

export const usersRelations = relations(users, ({ many }) => ({
  tasks: many(tasks),
}));

This defines the relationships between tables (and is closely related to foreign keys). If you choose to use the Drizzle query API (the one that’s not SQL with types), Drizzle is capable of understanding that some tables have foreign keys into other tables, and allows you to pull down objects, with related objects in one fell swoop. For example, the tasks table has a user_id column in it, representing the user it’s assigned to. With the relationship set up, we can write queries like this:

const tasks = await db.query.tasks.findMany({
  with: {
    user: true,
  },
});

This will pull down all tasks, along with the user each is assigned to.

Making Changes (Migrations)

With the code generation above, we’d now be capable of using Drizzle. But that’s not what this post is about. See my last post on Drizzle, or even just the Drizzle docs for guides on using it. This post is all about database migrations. So far, we took an existing database, and scaffolded a valid Drizzle schema. Now let’s run a script to add some things to the database, and see about updating our Drizzle schema.

We’ll add a new column to tasks called importance, and we’ll also add an index on the tasks table, on the epic_id column. This is unrelated to the foreign key we already have on this column. This is a traditional database index that would assist us in querying the tasks table on the epic_id column.

Here’s the SQL script we’ll run:

CREATE INDEX idx_tasks_epic ON tasks (epic_id);

ALTER TABLE tasks
    ADD COLUMN importance INT;

After running that script on our database, we’ll now run:

npx drizzle-kit pull

Our terminal should look like this:

Drizzle pull again

We can now see our schema updates in the git diffs:

Drizzle pull changes

Note the new columns being added, and the new index being created. Again, the index will not affect our application code; it will make our Drizzle schema a faithful representation of our database, so we can make changes on either side, and generate updates to the other. To that end, let’s see about updating our code, and generating SQL to match those changes.

The Code First Approach

Let’s go the other way. Let’s start with a Drizzle schema, and generate an SQL script from it. In order to get a Drizzle schema, let’s just cheat and grab the schema.ts and relations.ts files Drizzle created above. We’ll paste them into the drizzle-schema folder, and remove anything else Drizzle created: any snapshots, and anything in the meta folder Drizzle uses to track our history.

Next, since we want Drizzle to read our schema files, rather than just generate them, we need to tell Drizzle where they are. We’ll go back into our drizzle.config.ts file, and add this line:

schema: ["./drizzle-schema/schema.ts", "./drizzle-schema/relations.ts"],

Now run:

npx drizzle-kit generate

Voila! We have database assets being created.

Drizzle pull changes

The resulting sql file is huge. Mine is named 0000_quick_wild_pack.sql (Drizzle will add these silly names to make the files stand out) and looks like this, in part.

CREATE TABLE IF NOT EXISTS "epics" (
	"id" serial PRIMARY KEY NOT NULL,
	"name" varchar(250),
	"description" text,
	"due" date
);
--> statement-breakpoint
CREATE TABLE IF NOT EXISTS "tags" (
	"id" serial PRIMARY KEY NOT NULL,
	"name" varchar(250)
);
--> statement-breakpoint
CREATE TABLE IF NOT EXISTS "tasks" (
	"id" serial PRIMARY KEY NOT NULL,
	"name" varchar(250),
	"epic_id" integer,
	"user_id" integer
);

Making a schema change

Now let’s make some changes to our schema. Let’s add that same importance column to our tasks table, add that same index on epicId, and then, for fun, let’s tell Drizzle that our foreign key on userId should have an ON DELETE CASCADE rule, meaning that if we delete a user, the database will automatically delete all tasks assigned to that user. This would probably be an awful rule to add to a real issue tracking software, but it’ll help us see Drizzle in action.

Here are the changes:

And now we’ll run npx drizzle-kit generate and you should see:

As before, Drizzle generated a new sql file, this time called 0001_curved_warhawk.sql which looks like this:

ALTER TABLE "tasks" DROP CONSTRAINT "fk_task_user";
--> statement-breakpoint
ALTER TABLE "users" ADD COLUMN "importance" integer;--> statement-breakpoint
DO $$ BEGIN
 ALTER TABLE "tasks" ADD CONSTRAINT "fk_task_user" FOREIGN KEY ("user_id") REFERENCES "public"."users"("id") ON DELETE cascade ON UPDATE no action;
EXCEPTION
 WHEN duplicate_object THEN null;
END $$;
--> statement-breakpoint
CREATE INDEX IF NOT EXISTS "idx_tasks_epicId" ON "tasks" USING btree ("epic_id");

It added a column, overwrote the foreign key constraint we already had to add our CASCADE rule, and created our index on epic_id.

Mixing & Matching Approaches

Make no mistake, you do not have to go all in on code-first, or database-first. You can mix and match approaches. You can scaffold a Drizzle schema from a pre-existing database using drizzle-kit pull, and then make changes to the code, and generate sql files to patch your database with the changes using drizzle-kit generate. Try it and see!

Going Further

Believe it or not, we’re only scratching the surface of what drizzle-kit can do. If you like what you’ve seen so far, be sure to check out the docs.

Concluding Thoughts

Drizzle is an incredibly exciting ORM. Not only does it manage to add an impressive layer of static typing on top of SQL, allowing you to enjoy the power and flexibility of SQL with the type safety you already expect from TypeScript. But it also provides an impressive suite of commands for syncing your changing database with your ORM schema.

]]>
https://frontendmasters.com/blog/drizzle-database-migrations/feed/ 0 4692
Loading Data with TanStack Router: react-query https://frontendmasters.com/blog/tanstack-router-data-loading-2/ https://frontendmasters.com/blog/tanstack-router-data-loading-2/#comments Thu, 21 Nov 2024 18:11:14 +0000 https://frontendmasters.com/blog/?p=4492 TanStack Query, commonly referred to as react-query, is an incredibly popular tool for managing client-side querying. You could create an entire course on react-query, and people have, but here we’re going to keep it brief so you can quickly get going.

Article Series

Essentially, react-query allows us to write code like this:

const { data, isLoading } = useQuery({
  queryKey: ["task", taskId],
  queryFn: async () => {
    return fetchJson("/api/tasks/" + taskId);
  },
  staleTime: 1000 * 60 * 2,
  gcTime: 1000 * 60 * 5,
});

The queryKey does what it sounds like: it lets you identify any particular key for a query. As the key changes, react-query is smart enough to re-run the query, which is contained in the queryFn property. As these queries come in, TanStack tracks them in a client-side cache, along with properties like staleTime and gcTime, which mean the same thing as they do in TanStack Router. These tools are built by the same people, after all.

There’s also a useSuspenseQuery hook which is the same idea, except instead of giving you an isLoading value, it relies on Suspense, and lets you handle loading state via Suspense boundaries.

This barely scratches the surface of Query. If you’ve never used it before, be sure to check out the docs.

We’ll move on and cover the setup and integration with Router, but we’ll stay high level to keep this post a manageable length.

Setup

We need to wrap our entire app with a QueryClientProvider which injects a queryClient (and cache) into our application tree. Putting it around the RouterProvider we already have is as good a place as any.

const queryClient = new QueryClient();

const Main: FC = () => {
  return (
    <>
      <QueryClientProvider client={queryClient}>
        <RouterProvider router={router} context={{ queryClient }} />
      </QueryClientProvider>
      <TanStackRouterDevtools router={router} />
    </>
  );
};

Recall from before that we also passed our queryClient to our Router’s context like this:

const router = createRouter({ 
  routeTree, 
  context: { queryClient }
});

And:

type MyRouterContext = {
  queryClient: QueryClient;
};

export const Route = createRootRouteWithContext<MyRouterContext>()({
  component: Root,
});

This allows us access to the queryClient inside of our loader functions via the Router’s context. If you’re wondering why we need loaders at all, now that we’re using react-query, stay tuned.

Querying

We used Router’s built-in caching capabilities for our tasks. For epics, let’s use react-query. Moreover, let’s use the useSuspenseQuery hook, since managing loading state via Suspense boundaries is extremely ergonomic. Moreover, Suspense boundaries is exactly how Router’s pendingComponent works. So you can use useSuspenseQuery, along with the same pendingComponent we looked at before!

Let’s add another (contrived) summary query in our epics layout (route) component.

export const Route = createFileRoute("/app/epics")({
  component: EpicLayout,
  pendingComponent: () => <div>Loading epics route ...</div>,
});

function EpicLayout() {
  const context = Route.useRouteContext();
  const { data } = useSuspenseQuery(epicsSummaryQueryOptions(context.timestarted));

  return (
    <div>
      <h2>Epics overview</h2>
      <div>
        {data.epicsOverview.map(epic => (
          <Fragment key={epic.name}>
            <div>{epic.name}</div>
            <div>{epic.count}</div>
          </Fragment>
        ))}
      </div>

      <div>
        <Outlet />
      </div>
    </div>
  );
}

To keep the code somewhat organized (and other reasons we’ll get to) I stuck the query options into a separate place.

export const epicsSummaryQueryOptions = (timestarted: number) => ({
  queryKey: ["epics", "summary"],
  queryFn: async () => {
    const timeDifference = +new Date() - timestarted;
    console.log("Running api/epics/overview query at", timeDifference);
    const epicsOverview = await fetchJson<EpicOverview[]>("api/epics/overview");
    return { epicsOverview };
  },
  staleTime: 1000 * 60 * 5,
  gcTime: 1000 * 60 * 5,
});

A query key, and function, and some cache settings. I’m passing in the timestarted value from context, so we can see when these queries fire. This will help us detect waterfalls.

Let’s look at the root epics page (with a few details removed for space).

type SearchParams = {
  page: number;
};

export const Route = createFileRoute("/app/epics/")({
  validateSearch(search: Record<string, unknown>): SearchParams {
    return {
      page: parseInt(search.page as string, 10) || 1,
    };
  },
  loaderDeps: ({ search }) => {
    return { page: search.page };
  },
  component: Index,
  pendingComponent: () => <div>Loading epics ...</div>,
  pendingMinMs: 3000,
  pendingMs: 10,
});

function Index() {
  const context = Route.useRouteContext();
  const { page } = Route.useSearch();

  const { data: epicsData } = useSuspenseQuery(epicsQueryOptions(context.timestarted, page));
  const { data: epicsCount } = useSuspenseQuery(epicsCountQueryOptions(context.timestarted));

  return (
    <div className="p-3">
      <h3>Epics page!</h3>
      <h3>There are {epicsCount.count} epics</h3>
      <div>
        {epicsData.map((e, idx) => (
          <Fragment key={idx}>
            <div>{e.name}</div>
          </Fragment>
        ))}
        <div className="flex gap-3">
          <Link to="/app/epics" search={{ page: page - 1 }} disabled={page === 1}>
            Prev
          </Link>
          <Link to="/app/epics" search={{ page: page + 1 }} disabled={!epicsData.length}>
            Next
          </Link>
        </div>
      </div>
    </div>
  );
}

Two queries on this page: one to get the list of (paged) epics, another to get the total count of all the epics. Let’s run it

It’s as silly as before, but it does show the three pieces of data we’ve fetched: the overview data we fetched in the epics layout; and then the count of epics, and the list of epics we loaded in the epics page beneath that.

What’s more, when we run this, we first see the pending component for our root route. That resolves quickly, and shows the main navigation, along with the pending component for our epics route. That resolves, showing the epics overview data, and then revealing the pending component for our epics page, which eventually resolves and shows the list and count of our epics.

Our component-level data fetching is working, and integrating, via Suspense, with the same Router pending components we already had. Very cool!

Let’s take a peak at our console though, and look at all the various logging we’ve been doing, to track when these fetches happen

The results are… awful. Component-level data fetching with Suspense feels really good, but if you’re not careful, these waterfalls are extremely easy to create. The problem is, when a component suspends while waiting for data, it prevents its children from rendering. This is precisely what’s happening here. The route is suspending, and not even giving the child component, which includes the page (and any other nested route components underneath) from rendering, which prevents those components’ fetches from starting.

There’s two potential solutions here: we could dump Suspense, and use the useQuery hook, instead, which does not suspend. That would require us to manually track multiple isLoading states (for each useQuery hook), and coordinate loading UX to go with that. For the epics page, we’d need to track both the count loading state, and the epics list state, and not show our UI until both have returned. And so on, for every other page.

The other solution is to start pre-fetching these queries sooner.

We’ll go with option 2.

Prefetching

Remember previously we saw that loader functions all run in parallel. This is the perfect opportunity to start these queries off ahead of time, before the components even render. TanStack Query gives us an API to do just that.

To prefetch with Query, we take the queryClient object we saw before, and call queryClient.prefetchQuery and pass in the exact same query options and Query will be smart enough, when the component loads and executes useSuspenseQuery, to see that the query is already in flight, and just latch onto that same request. That’s also a big reason why we put those query options into the epicsSummaryQueryOptions helper function: to make it easier to reuse in the loader, to prefetch.

Here’s the loader we’ll add to the epics route:

loader({ context }) {
  const queryClient = context.queryClient;
  queryClient.prefetchQuery(epicsSummaryQueryOptions(context.timestarted));
},

The loader receives the route tree’s context, from which it grabs the queryClient. From there, we call prefetchQuery and pass in the same options.

Let’s move on to the Epics page. To review, this is the relevant code from our Epics page:

function Index() {
  const context = Route.useRouteContext();
  const { page } = Route.useSearch();

  const { data: epicsData } = useSuspenseQuery(epicsQueryOptions(context.timestarted, page));
  const { data: epicsCount } = useSuspenseQuery(epicsCountQueryOptions(context.timestarted));
  
  // ..

We grab the current page from the URL, and the context, for the timestarted value. Now let’s do the same thing we just did, and repeat this code in the loader, to prefetch.

async loader({ context, deps }) {
  const queryClient = context.queryClient;

  queryClient.prefetchQuery(epicsQueryOptions(context.timestarted, deps.page));
  queryClient.prefetchQuery(epicsCountQueryOptions(context.timestarted));
},

Now when we check the console, we see something a lot nicer.

Fetching state

What happens when we page up. The page value will change in the URL, Router will send a new page value down into our loader, and our component. Then, our useSuspenseQuery will execute with new query values, and suspend again. That means our existing list of tasks will disappear, and show the “loading tasks” pending component. That would be a terrible UX.

Fortunately, React offers us a nice solution, with the useDeferredValue hook. The docs are here. This allows us to “defer” a state change. If a state change causes our deferred value on the page to suspend, React will keep the existing UI in place, and the deferred value will simply hold the old value. Let’s see it in action.

function Index() {
  const { page } = Route.useSearch();
  const context = Route.useRouteContext();

  const deferredPage = useDeferredValue(page);
  const loading = page !== deferredPage;

  const { data: epicsData } = useSuspenseQuery(
    epicsQueryOptions(context.timestarted, deferredPage)
  );
  const { data: epicsCount } = useSuspenseQuery(
    epicsCountQueryOptions(context.timestarted)
  );
 
  // ...

We wrap the changing page value in useDeferredValue, and just like that, our page does not suspend when the new query is in flight. And to detect that a new query is running, we compare the real, correct page value, with the deferredPage value. If they’re different, we know new data are loading, and we can display a loading spinner (or in this case, put an opacity overlay on the epics list)

Queries are re-used!

When using react-query for data management, we can now re-use the same query across different routes. Both the view epic and edit epic pages need to fetch info on the epic the user is about to view, or edit. Now we can define those options in one place, like we had before.

export const epicQueryOptions = (timestarted: number, id: string) => ({
  queryKey: ["epic", id],
  queryFn: async () => {
    const timeDifference = +new Date() - timestarted;

    console.log(`Loading api/epic/${id} data at`, timeDifference);
    const epic = await fetchJson<Epic>(`api/epics/${id}`);
    return epic;
  },
  staleTime: 1000 * 60 * 5,
  gcTime: 1000 * 60 * 5,
});

We can use them in both routes, and have them be cached in between (assuming we set the caching values to allow that). You can try it in the demo app: view an epic, go back to the list, then edit the same epic (or vice versa). Only the first of those pages you visit should cause the fetch to happen in your network tab.

Updating with react-query

Just like with tasks, epics have a page where we can edit an individual epic. Let’s see what the saving logic looks like with react-query.

Let’s quickly review the query keys for the epics queries we’ve seen so far. For an individual epic, it was:

export const epicQueryOptions = (timestarted: number, id: string) => ({
  queryKey: ["epic", id],

For the epics list, it was this:

export const epicsQueryOptions = (timestarted: number, page: number) => ({
  queryKey: ["epics", "list", page],

And the count:

export const epicsCountQueryOptions = (timestarted: number) => ({
  queryKey: ["epics", "count"],

Finally, the epics overview:

export const epicsSummaryQueryOptions = (timestarted: number) => ({
  queryKey: ["epics", "summary"],

Notice the pattern: epics followed by various things for the queries that affected multiple epics, and for an individual epic, we did ['epic', ${epicId}]. With that in mind, let’s see just how easy it is to invalidate these queries after a mutation:

const save = async () => {
  setSaving(true);
  await postToApi("api/epic/update", {
    id: epic.id,
    name: newName.current!.value,
  });

  queryClient.removeQueries({ queryKey: ["epics"] });
  queryClient.removeQueries({ queryKey: ["epic", epicId] });

  navigate({ to: "/app/epics", search: { page: 1 } });

  setSaving(false);
};

The magic is on the highlighted lines.

With one fell sweep, we remove all cached entries for any query that started with epics, or started with ['epic', ${epicId}], and Query will handle the rest. Now, when we navigate back to the epics page (or any page that used these queries), we’ll see the suspense boundary show, while fresh data are loaded. If you’d prefer to keep stale data on the screen, while the fresh data load, that’s fine too: just use queryClient.invalidateQueries instead. If you’d like to detect if a query is re-fetching in the background, so you can display an inline spinner, use the isFetching property returned from useSuspenseQuery.

const { data: epicsData, isFetching } = useSuspenseQuery(
  epicsQueryOptions(context.timestarted, deferredPage)
);

Odds and ends

We’ve gone pretty deep on TanStack Route and Query. Let’s take a look at one last trick.

If you recall, we saw that pending components ship a related pendingMinMs that forced a pending component to stay on the page a minimum amount of time, even if the data were ready. This was to avoid a jarring flash of a loading state. We also saw that TanStack Router uses Suspense to show those pending components, which means that react-query’s useSuspenseQuery will seamlessly integrate with it. Well, almost seamlessly. Router can only use the pendingMinMs value with the promise we return from the Router’s loader. But now we don’t really return any promise from the loader; we prefetch some stuff, and rely on component-level data fetching to do the real work.

Well there’s nothing stopping you from doing both! Right now our loader looks like this:

async loader({ context, deps }) {
  const queryClient = context.queryClient;

  queryClient.prefetchQuery(epicsQueryOptions(context.timestarted, deps.page));
  queryClient.prefetchQuery(epicsCountQueryOptions(context.timestarted));
},

Query also ships with a queryClient.ensureQueryData method, which can load query data, and return a promise for that request. Let’s put it to good use so we can use pendingMinMs again.

One thing you do not want to do is this:

await queryClient.ensureQueryData(epicsQueryOptions(context.timestarted, deps.page)),
await queryClient.ensureQueryData(epicsCountQueryOptions(context.timestarted)),

That will block on each request, serially. In other words, a waterfall. Instead, to kick off both requests immediately and wait on them in the loader (without a waterfall), you can do this:

await Promise.allSettled([
  queryClient.ensureQueryData(epicsQueryOptions(context.timestarted, deps.page)),
  queryClient.ensureQueryData(epicsCountQueryOptions(context.timestarted)),
]);

Which works, and keeps the pending component on the screen for the duration of pendingMinMs

You won’t always, or even usually need to do this. But it’s handy for when you do.

Wrapping up

This has been a whirlwind route of TanStack Router and TanStack Query, but hopefully not an overwhelming one. These tools are incredibly powerful, and offer the ability to do just about anything. I hope this post will help some people put them to good use!

Article Series

]]>
https://frontendmasters.com/blog/tanstack-router-data-loading-2/feed/ 2 4492
Loading Data with TanStack Router: Getting Going https://frontendmasters.com/blog/tanstack-router-data-loading-1/ https://frontendmasters.com/blog/tanstack-router-data-loading-1/#respond Wed, 20 Nov 2024 18:52:19 +0000 https://frontendmasters.com/blog/?p=4465 TanStack Router is one of the most exciting projects in the web development ecosystem right now, and it doesn’t get nearly enough attention. It’s a fully fledged client-side application framework that supports advanced routing, nested layouts, and hooks for loading data. Best of all, it does all of this with deep type safety.

Article Series

This post is all about data loading. We’ll cover the built-in hooks TanStack Router ships with to load and invalidate data. Then we’ll cover how easily TanStack Query (also known as react-query) integrates and see what the tradeoffs of each are.

The code for everything we’re covering is in this GitHub repo. As before, I’m building an extremely austere, imaginary Jira knockoff. There’s nothing useful in that repo beyond the bare minimum needed for us to take a close look at how data loading works. If you’re building your own thing, be sure to check out the DevTools for TanStack Router. They’re outstanding.

The app does load actual data via SQLite, along with some forced delays, so we can more clearly see (and fix) network waterfalls. If you want to run the project, clone it, run npm i, and then open two terminals. In the first, run npm run server, which will create the SQLite database, seed it with data, and set up the API endpoints to fetch, and update data. In the second, run npm run dev to start the main project, which will be on http://localhost:5173/. There are some (extremely basic) features to edit data. If at any point you want to reset the data, just reset the server task in your terminal.

The app is contrived. It exists to show Router’s capabilities. We’ll often have odd use cases, and frankly questionable design decisions. This was purposeful, in order to simulate real-world data loading scenarios, without needing a real-world application.

But what about SSR?

Router is essentially a client-side framework. There are hooks to get SSR working, but they’re very much DIY. If this disappoints you, I’d urge just a bit of patience. TanStack Start (now in Beta) is a new project that, for all intents and purposes, adds SSR capabilities to the very same TanStack Router we’ll be talking about. What makes me especially excited about TanStack Start is that it adds these server-side capabilities in a very non-intrusive way, which does not change or invalidate anything we’ll be talking about in this post (or talked about in my last post on Router, linked above). If that’s not entirely clear and you’d like to learn more, stay tuned for my future post on TanStack Start.

The plan

TanStack Router is an entire application framework. You could teach an entire course on it, and indeed there’s no shortage of YouTube videos out there. This blog will turn into a book if we try to cover each and every option in depth.

In this post we’ll cover the most relevant features and show code snippets where helpful. Refer to the docs for details. Also check out the repo for this post as all the examples we use in this post are fleshed out in their entirety there.

Don’t let the extremely wide range of features scare you. The vast majority of the time, some basic loaders will get you exactly what you need. We’ll cover some of the advanced features, too, so you know they’re there, if you ever do need them.

Starting at the top: context

When we create our router, we can give it “context.” This is global state. For our project, we’ll pass in our queryClient for react-query (which we’ll be using a little later). Passing the context in looks like this:

// main.tsx
import { createRouter } from "@tanstack/react-router";

import { QueryClient } from "@tanstack/react-query";

const queryClient = new QueryClient();

// Import the generated route tree
import { routeTree } from "./routeTree.gen";

const router = createRouter({ 
  routeTree, 
  context: { queryClient } 
});

Then we’ll make sure Router integrates what we put on context into the static types. We do this by creating our root route like this:

// routes/__root.tsx
export const Route = createRootRouteWithContext<MyRouterContext>()({
  component: Root,
});

This context will be available to all routes in the tree, and inside API hooks like loader, which we’ll get to shortly.

Adding to context

Context can change. We set up truly global context when we start Router up at our application’s root, but different locations in the route tree can add new things to context, which will be visible from there, downward in the tree. There’s two places for this, the beforeLoad function, and the context function. Yes: route’s can take a context function which modifies the route tree’s context value.

beforeLoad

The beforeLoad method runs always, on each active route, anytime the URL changes in any way. This is a good place to check preconditions and redirect. If you return a value from here, that value will be merged into the router’s context, and visible from that route downward. This function blocks all loaders from running, so be extremely careful what you do in here. Data loading should generally be avoided unless absolutely needed, since any loaders will wait until this function is complete, potentially creating waterfalls.

Here’s a good example of what to avoid, with an opportunity to see why. This beforeLoad fetches the current user, places it into context, and does a redirect if there is no user.

// routes/index.tsx
export const Route = createFileRoute("/")({
  async beforeLoad() {
    const user = await getCurrentUser();
    if (!user) {
      throw redirect({
        to: "/login",
      });
    }
    document.cookie = `user=${user.id};path=/;max-age=31536000`;

    return { user };
  },

  // ...

We’ll be looking at some data loading in a bit, and measure what starts when. You can go into the getCurrentUser function and uncomment the artificial delay in there, and see it block everything. This is especially obvious if you’re running Router’s DevTools. You’ll see this path block, and only once ready, allow all loaders below to execute.

But this is a good enough example to show how this works. The user object is now in context, visible to routes beneath it.

A more realistic example would be to check for a logged-in cookie, optimistically assume the user is logged in, and rely on network calls we do in the loaders to detect a logged-out user, and redirect accordingly. To make things even more realistic, those loaders for the initial render would run on the server, and figure out if a user is actually logged out before we show the user anything; but that will wait for a future post on TanStack Start.

What we have is sufficient to show how the beforeLoad callback works.

Context (function)

There’s also a context function we can provide routes. This is a non-async function that also gives us an opportunity to add to context. But it runs much more conservatively. This function only runs when the URL changes in a way that’s relevant to that route. So for a route of, say, app/epics/$epicId, the context function will re-run when the epicId param changes. This might seem strange, but it’s useful for modifying the context, but only when the route has changed, especially when you need to put non-primitive values (objects and functions) onto context. These non-primitive values are always compared by reference, and therefore always unique against the last value generated. As a result, they would cause render churning if added in beforeLoad, since React would (incorrectly) think it needed to re-render a route when nothing had changed.

For now, here’s some code in our root route to mark the time for when the initial render happens, so we can compare that to the timestamp of when various queries run in our tree. This will help us see, and fix network waterfalls.

// routes/__root.tsx
export const Route = createRootRouteWithContext<MyRouterContext>()({
  context({ location }) {
    const timeStarted = +new Date();
    console.log("");
    console.log("Fresh navigation to", location.href);
    console.log("-------------------");

    return { timestarted: timeStarted };
  },

  // ...

This code is in our root route, so it will never re-run, since there’s no path parameters the root route depends on.

Now everywhere in our route tree will have a timestarted value that we can use to detect any delays from data fetches in our tree.

Loaders

Let’s actually load some data. Router provides a loader function for this. Any of our route configurations can accept a loader function, which we can use to load data. Loaders all run in parallel. It would be bad if a layout needed to complete loading its data before the path beneath it started. Loaders receive any path params on the route’s URL, any search params (querystring values) the route has subscribed to, the context, and a few other goodies, and loads whatever data it needs. Router will detect what you return, and allow components to retrieve that data via the useLoaderData hook — strongly typed.

Loader in a route

Let’s take a look at tasks.route.tsx.

This is a route that will run for any URL at all starting with /app/tasks. It will run for that path, for /app/tasks/$taskId, for app/tasks/$taskId/edit, and so on.

export const Route = createFileRoute("/app/tasks")({
  component: TasksLayout,
  loader: async ({ context }) => {
    const now = +new Date();
    console.log(`/tasks route loader. Loading task layout info at + ${now - context.timestarted}ms since start`);

    const tasksOverview = await fetchJson<TaskOverview[]>("api/tasks/overview");
    return { tasksOverview };
  },
  gcTime: 1000 * 60 * 5,
  staleTime: 1000 * 60 * 2,
});

We receive the context, and grab the timestarted value from it. We request some overview data on our tasks, and send that data down.

The gcTime property controls how long old route data are kept in cache. So if we browse from tasks over to epics, and then come back in 5 minutes and 1 second, nothing will be there, and the page will load in fresh. staleTime controls how long a cached entry is considered “fresh.” This determines whether cached data are refetched in the background. Here it’s set to two minutes. This means if the user hits this page, then goes to the epics page, waits 3 minutes, then browses back to tasks, the cached data will show, while the tasks data is re-fetched in the background, and (if changed) update the UI.

You’re probably wondering if TanStack Router tells you this background re-fetch is happening, so you can show an inline spinner, and yes, you can detect this like so:

const { isFetching } = Route.useMatch();

Loader in a page

Now let’s take a look at the tasks page.

export const Route = createFileRoute("/app/tasks/")({
  component: Index,
  loader: async ({ context }) => {
    const now = +new Date();
    console.log(`/tasks/index path loader. Loading tasks at + ${now - context.timestarted}ms since start`);

    const tasks = await fetchJson<Task[]>("api/tasks");
    return { tasks };
  },
  gcTime: 1000 * 60 * 5,
  staleTime: 1000 * 60 * 2,
  pendingComponent: () => <div>Loading tasks list...</div>,
  pendingMs: 150,
  pendingMinMs: 200,
});

This is the route for the specific URL /app/tasks. If the user were to browse to /app/tasks/$taskId then this component would not run. This is a specific page, not a layout (which Router calls a “route”). Basically the same as before, except now we’re loading the list of tasks to display on this page.

We’ve added some new properties this time, though. The pendingComponent property allows us to render some content while the loader is working. We also specified pendingMs, which controls how long we wait before showing the pending component. Lastly, pendingMinMs allows us to force the pending component to stay on the screen for a specified amount of time, even if the data are ready. This can be useful to avoid a brief flash of a loading component, which can be jarring to the user.

If you’re wondering why we’d even want to use pendingMs to delay a loading screen, it’s for subsequent navigations. Rather than immediately transition from the current page to a new page’s loading component, this setting lets us stay on the current page for a moment, in the hopes that the new page will be ready quickly enough that we don’t have to show any pending component at all. Of course, on the initial load, when the web app first starts up, these pendingComponents do show immediately, as you’d expect.

Let’s run our tasks page.

It’s ugly, and frankly useless, but it works. Now let’s take a closer look.

Loaders running in parallel

If we peak in our console, we should see something like this:

If you have DevTools open, you should see something like below. Note how the route and page load and finish in parallel.

As we can see, these requests started a mere millisecond apart from each other, since the loaders are running in parallel (since this isn’t the real Jira, I had to manually add a delay of 750ms to each of the API endpoints).

Different routes using the same data

If we look at the loader for the app/tasks/$taskId route, and the loader to the app/tasks/$taskId/edit route, we see the same fetch call:

const task = await fetchJson<Task>(`api/tasks/${taskId}`);

This is because we need to load the actual task in order to display it, or in order to display it in a form for the user to make changes. Unfortunately though, if you click the edit button for any task, then go back to the tasks list (without saving anything), then click the edit button for the same task, you should notice the same exact data being requested. This makes sense. Both loaders happen to make the same fetch() call, but there’s nothing in our client to cache the call. This is probably fine 99% of the time, but this is one of the many things react-query will improve for us, in a bit.

Updating data

If you click the edit button for any task, you should be brought to a page with an extremely basic form that will let you edit the task’s name. Once we click save, we want to navigate back to the tasks list, but most importantly, we need to tell Router that we’ve changed some data, and that it will need to invalidate some cached entries, and re-fetch when we go back to those routes.

This is where Router’s built-in capabilities start to get stretched, and where we might start to want react-query (discussed in part 2 of this post). Router will absolutely let you invalidate routes, to force re-fetches. But the API is fairly simple, and fine-grained. We basically have to describe each route we want invalidated (or removed). Let’s take a look:

import { useRouter } from "@tanstack/react-router";

// ...

const router = useRouter();
const save = async () => {
  await postToApi("api/task/update", {
    id: task.id,
    title: newTitleEl.current!.value,
  });

  router.invalidate({
    filter: route => {
      return (
        route.routeId == "/app/tasks/" ||
        (route.routeId === "/app/tasks/$taskId/" && route.params.taskId === taskId) ||
        (route.routeId === "/app/tasks_/$taskId/edit" && route.params.taskId === taskId)
      );
    },
  });

  navigate({ to: "/app/tasks" });
};

Note the call to router.invalidate. This tells Router to mark any cached entries matching that filter as stale, causing us to re-fetch them the next time we browse to those paths. We could also pass absolutely nothing to that same invalidate method, which would tell Router to invalidate everything.

Here we invalidated the main tasks list, as well as the view and edit pages, for the individual task we just modified.

Now when we navigate back to the main tasks page we’ll immediately see the prior, now-stale data, but new data will fetch, and update the UI when present. Recall that we can use const { isFetching } = Route.useMatch(); to show an inline spinner while this fetch is happening.

If you’d prefer to completely remove the cache entries, and have the task page’s “Loading” component show, then you can use router.clearCache instead, with the same exact filter argument. That will remove those cache entries completely, forcing Router to completely re-fetch them, and show the pending component. This is because there is no longer any stale data left in the cache; clearCache removed it.

There is one small caveat though: Router will prevent you from clearing the cache for the page you’re on. That means we can’t clear the cache for the edit task page, since we’re sitting on it already. To be clear, when we call clearCache, the filter function won’t even look at the route you’re on; the ability to remove it simply does not exist.

Instead, you could do something like this:

router.clearCache({
  filter: route => {
    return route.routeId == "/app/tasks/" || (route.routeId === "/app/tasks_/$taskId/edit" && route.params.taskId === taskId);
  },
});

router.invalidate({
  filter: route => {
    return route.routeId === "/app/tasks_/$taskId/edit" && route.params.taskId === taskId;
  },
});

But really, at this point you should probably be looking to use react-query, which we’ll cover in the next post.

Article Series

]]>
https://frontendmasters.com/blog/tanstack-router-data-loading-1/feed/ 0 4465
Introducing Drizzle https://frontendmasters.com/blog/introducing-drizzle/ https://frontendmasters.com/blog/introducing-drizzle/#respond Mon, 17 Jun 2024 11:28:41 +0000 https://frontendmasters.com/blog/?p=2708 This is a post about an exciting new ORM tool (that’s “object relational mapper”) that is different than any ORM I’ve used before—and I’ve used quite a few! Spoiler: it’s Drizzle.

Wait, what’s an ORM?

Even if you’re using a non-relational database now (think MongoDB or Redis), sooner or later you’ll likely need a relational DB.

Knowing SQL is an essential skill for any software engineer, but writing SQL directly can be tricky! The tooling is usually primitive, with only minimal auto-complete to guide you, and you invariably go through a process of running your query, correcting errors, and repeating until you get it right.

ORMs try to help you with this process of crafting SQL. Typically, you tell the ORM about the shape of your DB and it exposes APIs to do typical things. If you have a books table in your DB, an ORM will give you an API for it where you can do stuff like:

const longBooks = books.find({ pages: { gt: 500 } });

Behind the scenes, the SQL created might be something like:

SELECT * FROM books WHERE pages > 500;

That may not look like a massive simplification, but as the parameters get more complex or strung together, the SQL can get a bit mindbending. Not to mention the ORM keeps that code in a language you’re likely already using, like JavaScript.

This ease of use may seem nice, but it can cause other problems. For example, you might struggle figuring out how to do non-trivial queries. And there are performance foot-guns, such as the infamous Select N + 1 problem, which you might cause without realizing it due to the abstracted away syntax.

Why Drizzle is Different

Drizzle takes a novel approach. Drizzle does provide you a traditional ORM querying API, like we saw above. But in addition to that, it also provides an API that is essentially a layer of typing on top of SQL itself. So rather than what we saw before, we might query our books table like this

const longBooks = await db
  .select()
  .from(books)
  .where(gt(books.pages, 500));

It’s more lines, but it’s closer to actual SQL, which provides us some nice benefits: it’s easier to learn, more flexible, and avoids traditional ORM footguns.

Let’s dive in and look closer. This post will take a brief overview of setting up Drizzle, and querying, and then do a deeper dive showing off some of its powerful abilities with this typed SQL querying API. The docs are here if you’d like to look closer at anything.

Using Drizzle in general, and some of the advanced things we’ll cover in this post requires a decent knowledge of SQL. If you’ve never, ever used SQL, you might struggle with a few of the things we discuss later on. That’s expected. Skim and jump over sections as needed. If nothing else, hopefully this post will motivate you to look at SQL.

Setting up the Schema

Drizzle can’t do much of anything if it doesn’t know about your database. There’s lots of utilities for showing Drizzle the structure (or schema) of your tables. We’ll take a very brief look, but a more complete example can be found here.

Drizzle supports Postgres, MySQL, and SQLite. The ideas are the same either way, but we’ll be using MySQL.

Let’s start to set up a table.

import { int, json, mysqlTable, varchar } from "drizzle-orm/mysql-core";

export const books = mysqlTable("books", {
  id: int("id").primaryKey().autoincrement(),
  userId: varchar("userId", { length: 50 }).notNull(),
  isbn: varchar("isbn", { length: 25 }),
  pages: int("pages"),
});

We tell Drizzle about our columns (we won’t show all of them here), and their data types.

Now we can run queries:

const result = await db
  .select()
  .from(books)
  .orderBy(desc(books.id))
  .limit(1);

This query returns an array of items which match the schema we provided Drizzle for this table.

First Query

Alternatively, as expected, we can also narrow our select list.

const result = await db
  .select({ id: books.id, isbn: books.isbn })
  .from(books)
  .orderBy(desc(books.id))
  .limit(1);

Note that the types of the columns match whatever we define in the schema. We won’t go over every possible column type (check the docs), but let’s briefly look at the JSON type:

export const books = mysqlTable("books", {
  id: int("id").primaryKey().autoincrement(),
  userId: varchar("userId", { length: 50 }).notNull(),
  isbn: varchar("isbn", { length: 25 }),
  pages: int("pages"),
  authors: json("authors"),
});

This adds an authors field to each book. But the type might not be what you want. Right now it’s unknown. This makes sense: JSON can have just about any structure. Fortunately, if you know your json column will have a predictable shape, you can specify it, like this:  

export const books = mysqlTable("books", {
  id: int("id").primaryKey().autoincrement(),
  userId: varchar("userId", { length: 50 }).notNull(),
  isbn: varchar("isbn", { length: 25 }),
  pages: int("pages"),
  authors: json("authors").$type<string[]>(),
});

And now, when we check, the authors property is of type string[] | null.

Typed JSON

If you were to mark the authors column as notNull() it would be typed as string[]. As you might expect, you can pass any type you’d like into the $type helper.

Query Whirlwind Tour

Let’s run a non-trivial, but still basic query to see what Drizzle looks like in practice. Let’s say we’re looking to find some nice beach reading for the summer. We want to find books that belong to you (userId == “123”), and is either less than 150 pages, or was written by Stephan Jay Gould. We want the first ten, and we want them sort from most recently added to least recently added (the id key is auto-numbered, so we can sort on that for the same effect)

In SQL we’d do something like this:

SELECT *
FROM books
WHERE userId = '123' AND (pages < 150 OR authors LIKE '%Stephen Jay Gould%')
ORDER BY id desc
LIMIT 10

With Drizzle we’d write this:

const result = await db
  .select()
  .from(books)
  .where(
    and(
      eq(books.userId, userId),
      or(lt(books.pages, 150), like(books.authors, "%Stephen Jay Gould%"))
    )
  )
  .orderBy(desc(books.id))
  .limit(10);

Which works!

[
  {
    "id": 1088,
    "userId": "123",
    "authors": ["Siry, Steven E"],
    "title": "Greene: Revolutionary General (Military Profiles)",
    "isbn": "9781574889130",
    "pages": 144
  },
  {
    "id": 828,
    "userId": "123",
    "authors": ["Morton J. Horwitz"],
    "title": "The Warren Court and the Pursuit of Justice",
    "isbn": "0809016257",
    "pages": 144
  },
  {
    "id": 506,
    "userId": "123",
    "authors": ["Stephen Jay Gould"],
    "title": "Bully for Brontosaurus: Reflections in Natural History",
    "isbn": "039330857X",
    "pages": 544
  },
  {
    "id": 412,
    "userId": "123",
    "authors": ["Stephen Jay Gould"],
    "title": "The Flamingo's Smile: Reflections in Natural History",
    "isbn": "0393303756",
    "pages": 480
  },
  {
    "id": 356,
    "userId": "123",
    "authors": ["Stephen Jay Gould"],
    "title": "Hen's Teeth and Horse's Toes: Further Reflections in Natural History",
    "isbn": "0393311031",
    "pages": 416
  },
  {
    "id": 319,
    "userId": "123",
    "authors": ["Robert J. Schneller"],
    "title": "Cushing: Civil War SEAL (Military Profiles)",
    "isbn": "1574886967",
    "pages": 128
  }
]

The Drizzle version was actually a little bit longer. But we’re not optimizing for fewest possible lines of code. The Drizzle version is typed, with autocomplete to guide you toward a valid query, and TypeScript to warn you when you miss. The query is also a lot more composable. What do I mean by that?

Composability: Putting Queries Together

Let’s write something slightly more advanced and slightly more realistic. Let’s code up a function that takes any number of search filters, and puts together a query. Here’s what the filters look like

type SearchPacket = Partial<{
  title: string;
  author: string;
  maxPages: number;
  subjects?: number[];
}>;

Note the Partial type. We’re taking in any number of these filters—possibly none of them. Whichever filters are passed, we want them to be additive; we want them combined with and. We’ve seen and already, and it can take the result of calls to eqlt, and lots of others. We’ll need to create an array of all of these filters, and Drizzle gives us a parent type that can hold any of them: SQLWrapper.

Let’s get started.

async function searchBooks(args: SearchPacket) {
  const searchConditions: SQLWrapper[] = [];
}

We’ve got our array of filters. Now let’s start filling it up.

async function searchBooks(args: SearchPacket) {
  const searchConditions: SQLWrapper[] = [];
  if (args.title) {
    searchConditions.push(like(books.title, `%${args.title}%`));
  }
}

Nothing new, yet. This is the same filter we saw before with authors.

Speaking of authors, let’s add that query next. But let’s make the author check a little more realistic. It’s not a varchar column, it holds JSON values, which themselves are strings of arrays. MySQL gives us a way to search JSON: the ->> operator. This takes a JSON column, and evaluates a path on it. So if you had objects in there, you’d pass string paths to get properties out. We just have an array of strings, so our path is $, which is the actual values in the array. And the string comparrisons when we’re filtering on JSON columns like this is no longer case insensitive, so we’ll want to use the LOWER function in MySQL.

Typically, with traditional ORM’s you’d scramble to the docs to look for an equivalent to the ->> operator, as well as the LOWER function. Drizzle does something better, and gives us a nice escape hatch to just write SQL directly in situations like this. Let’s implement our authors filter.

async function searchBooks(args: SearchPacket) {
  const searchConditions: SQLWrapper[] = [];
  if (args.title) {
    searchConditions.push(like(books.title, `%${args.title}%`));
  }
  if (args.author) {
    searchConditions.push(
      sql`LOWER(${books.authors}->>"$") LIKE ${`%${args.author.toLowerCase()}%`}`
    );
  }
}

Note the sql tagged template literal. It lets us put arbitrary SQL in for one-off operations that may not be implemented in the ORM. Before moving on, let’s take a quick peak at the SQL generated by this:

{
  "sql": "select `id`, `userId`, `authors`, `title`, `isbn`, `pages` from `books` where (`books`.`userId` = ? and LOWER(`books`.`authors`->>\"$\") LIKE ?) order by `books`.`id` desc limit ?",
  "params": ["123", "%gould%", 10]
}

Let’s zoom in on the authors piece. What we entered as…

sql`LOWER(${books.authors}->>"$") LIKE ${`%${args.author.toLowerCase()}%`}`

… gets transformed as:

LOWER(`books`.`authors`->>"$") LIKE ?

Our search term was parameterized; however, Drizzle was smart enough to not parameterize our column. I’m continuously impressed by small details like this. The maxPages piece is the same as before

if (args.maxPages) {
  searchConditions.push(lte(books.pages, args.maxPages));
}

Nothing new or interesting. Now let’s look at the subjects filter. We can pass in an array of subject ids, and we want to filter books that have that subject. The relationship between books and subjects is stored in a separate table, booksSubjects. This table simply has rows with an id, a book id, and a subject id (and also the userId for that book, to make other queries easier).

So if book 12 has subject 34, there’ll be a row with bookId of 12, and subjectId of 34.

In SQL when we want to see if a given row exists in some table, we use the exists keyword, and Drizzle has an exists function for this very purpose. Let’s move on with our function

async function searchBooks(args: SearchPacket) {
  const searchConditions: SQLWrapper[] = [];
  if (args.title) {
    searchConditions.push(like(books.title, `%${args.title}%`));
  }
  if (args.author) {
    searchConditions.push(
      sql`LOWER(${books.authors}->>"$") LIKE ${`%${args.author.toLowerCase()}%`}`,
    );
  }
  if (args.maxPages) {
    searchConditions.push(lte(books.pages, args.maxPages));
  }
  if (args.subjects?.length) {
    searchConditions.push(
      exists(
        db
          .select({ _: sql`1` })
          .from(booksSubjects)
          .where(
            and(
              eq(books.id, booksSubjects.book),
              inArray(booksSubjects.subject, args.subjects),
            ),
          ),
      ),
    );
  }

We can pass an exists() call right into our list of filters, just like with real SQL. This bit:

_: sql`1`

… is curious, but that’s just us saying SELECT 1 which is a common way of putting something into a SELECT list, even though we’re not pulling back any data; we’re just checking for existence. Lastly, the inArray Drizzle helper is how we generate an IN query. Here’s what the generated SQL looks like for this subjects query:

select `id`, `userId`, `authors`, `title`, `isbn`, `pages`
from `books`
where (`books`.`userId` = ? and exists (select 1
                                        from `books_subjects`
                                        where (`books`.`id` = `books_subjects`.`book` and
                                               `books_subjects`.`subject` in (?, ?))))
order by `books`.`id` desc
limit ?

That was our last filter. Now we can pipe our filters in to execute the query we put together.

async function searchBooks(args: SearchPacket) {
  const searchConditions: SQLWrapper[] = [];
  if (args.title) {
    searchConditions.push(like(books.title, `%${args.title}%`));
  }
  if (args.author) {
    searchConditions.push(
      sql`LOWER(${books.authors}->>"$") LIKE ${`%${args.author.toLowerCase()}%`}`
    );
  }
  if (args.maxPages) {
    searchConditions.push(lte(books.pages, args.maxPages));
  }
  if (args.subjects?.length) {
    searchConditions.push(
      exists(
        db
          .select({ _: sql`1` })
          .from(booksSubjects)
          .where(
            and(
              eq(books.id, booksSubjects.book),
              inArray(booksSubjects.subject, args.subjects)
            )
          )
      )
    );
  }

  const result = await db
    .select()
    .from(books)
    .where(and(eq(books.userId, userId), ...searchConditions))
    .orderBy(desc(books.id))
    .limit(10);
}

The ability to treat SQL queries as typed function calls that can be combined arbitratily is what makes Drizzle shine.

Digging deeper

We could end the post here, but let’s go further and see how Drizzle handles something fairly complex. You might never need (or want to) write queries like this. My purpose in including this section is to show that you can, if you ever need to.

With that out of the way, let’s write a query to get aggregate info about our books. We want our most and least popular subject(s), and how many books we have with those subjects. We also want to know any unused subjects, as well as that same info about tags (which we haven’t talked about). And also the total number of books we have overall. This data might be displayed in a screen like this.

aggregate screen
Screenshot

To keep this section manageable we’ll just get the book counts and the most and least subjects. The other pieces are variations on that theme. You can see the finished product here.

Let’s look at some of the SQL for this and how to write it with Drizzle.

Number of books per subject

In SQL we can group things together with GROUP BY.

SELECT
    subject,
    count(*)
FROM books_subjects
GROUP BY subject

Now our SELECT list, rather than pulling items from a table, is now pulling from a (conceptual) lookup table. We (conceptually) have a bunch of buckets stored by subject id. So we can select those subject id’s, as well as aggregate info from the buckets themselves, which we do with the count(*). This selects each subject, and the number of books under that subject.

And it works:

Group by

But we want the most, and least popular subjects. SQL also has what are called window functions. We can, on the fly, sort these buckets in some order, and then ask questions about the data, sorted in that way. We basically want the subject(s) with the highest, or lowest number of books, including ties. It turns out RANK is exactly what we want. Let’s see how this works

SELECT
    subject,
    count(*) as count,
    RANK() OVER (ORDER BY count(*) DESC) MaxSubject,
    RANK() OVER (ORDER BY count(*) ASC) MinSubject
FROM books_subjects
WHERE userId = '123'
GROUP BY subject

We ask for the rank of each row, when the whole result set is sorted in whatever way we describe.

rank

Subjects 79, 137 and 150 all have a minSubject rank of 1, which means they are the least used subject, which makes sense since there’s only one book with that subject.

It’s a little mind bendy at first, so don’t worry if this looks a little weird. The point is to show how well Drizzle can simplify SQL for us, not to be a deep dive into SQL, so let’s move on.

We want the subjects with a MaxSubject of 1, or a MinSubject of 1. We can’t use WHERE for this, at least not directly. The solution in SQL is to turn this query into a virtual table, and query that. It looks like this:

SELECT
    t.subject id,
    CASE WHEN t.MinSubject = 1 THEN 'MinSubject' ELSE 'MaxSubject' END as label,
    t.count
FROM (
    SELECT
        subject,
        count(*) as count,
        RANK() OVER (ORDER BY count(*) DESC) MaxSubject,
        RANK() OVER (ORDER BY count(*) ASC) MinSubject
    FROM books_subjects
    WHERE userId = '123'
    GROUP BY subject
) t
WHERE t.MaxSubject = 1 OR t.MinSubject = 1

And it works.

rank

Moving this along

We won’t show tags, since it’s basically identical except we hit a books_tags table, instead of books_subjects. We also won’t show unused subjects (or tags), which is also very similar, except we use a NOT EXISTS query.

The query to get the total number of books looks like this:

SELECT count(*) as count
FROM books
WHERE userId = '123'

Let’s add some columns to get it in the same structure as our subjects queries:

SELECT
    0 id,
    'Books Count' as label,
    count(*) as count
FROM books
WHERE userId = '123'

Now, since these queries return the same structure, let’s combine them into one big query. We use UNION for this.

SELECT *
FROM (
    SELECT
        t.subject id,
        CASE WHEN t.MinSubject = 1 THEN 'MinSubject' ELSE 'MaxSubject' END as label,
        t.count
    FROM (
        SELECT
            subject,
            count(*) as count,
            RANK() OVER (ORDER BY count(*) DESC) MaxSubject,
            RANK() OVER (ORDER BY count(*) ASC) MinSubject
        FROM books_subjects
        GROUP BY subject
    ) t
    WHERE t.MaxSubject = 1 OR t.MinSubject = 1
) subjects
UNION
    SELECT
        0 id,
        'Books Count' as label,
        count(*) as count
    FROM books
    WHERE userId = '123';

And it works! Phew!

union query

But this is gross to write manually, and even grosser to maintain. There’s a lot of pieces here and there’s no (good) way to break this apart and manage separately. SQL is ultimately text, and you can, of course, generate these various pieces of text with different functions in your code, and then concatenate them together.

But that’s fraught with difficulty too. It’s easy to get small details wrong when you’re pasting strings of code together. And believe it or not, this query is much simpler than much of what I’ve seen.

The Drizzle Way

Let’s see what this looks like in Drizzle. Remember that initial query to get each subject, with its count, and rank? Here it is in Drizzle

const subjectCountRank = () =>
  db
    .select({
      subject: booksSubjects.subject,
      count: sql<number>`COUNT(*)`.as("count"),
      rankMin: sql<number>`RANK() OVER (ORDER BY COUNT(*) ASC)`.as("rankMin"),
      rankMax: sql<number>`RANK() OVER (ORDER BY COUNT(*) DESC)`.as("rankMax"),
    })
    .from(booksSubjects)
    .where(eq(booksSubjects.userId, userId))
    .groupBy(booksSubjects.subject)
    .as("t");

Drizzle supports grouping, and it even has an as function to alias a query, and enable it to be queried from. Let’s do that next.

const subjectsQuery = () => {
  const subQuery = subjectCountRank();

  return db
    .select({
      label:
        sql<string>`CASE WHEN t.rankMin = 1 THEN 'MIN Subjects' ELSE 'MAX Subjects' END`.as(
          "label"
        ),
      count: subQuery.count,
      id: subQuery.subject,
    })
    .from(subQuery)
    .where(or(eq(subQuery.rankMin, 1), eq(subQuery.rankMax, 1)));
};

We stuck our query to get the ranks in a function, and then we just called that function, and queried from its result. SQL is feeling a lot more like normal coding!

The query for the total book count is simple enough.

db
  .select({ label: sql<string>`'All books'`, count: sql<number>`COUNT(*)`, id: sql<number>`0` })
  .from(books)
  .where(eq(books.userId, userId)),

Hopefully we won’t be too surprised to learn that Drizzle has a union function, to union queries together. Let’s see it all together:

const dataQuery = union(
  db
    .select({
      label: sql<string>`'All books'`,
      count: sql<number>`COUNT(*)`,
      id: sql<number>`0`,
    })
    .from(books)
    .where(eq(books.userId, userId)),
  subjectsQuery()
);

Which generates this SQL for us:

(select 'All books', COUNT(*), 0 from `books` where `books`.`userId` = ?)
union
(select CASE WHEN t.rankMin = 1 THEN 'MIN Subjects' ELSE 'MAX Subjects' END as `label`, `count`, `subject`
 from (select `subject`,
              COUNT(*)                             as `count`,
              RANK() OVER (ORDER BY COUNT(*) ASC)  as `rankMin`,
              RANK() OVER (ORDER BY COUNT(*) DESC) as `rankMax`
       from `books_subjects`
       where `books_subjects`.`userId` = ?
       group by `books_subjects`.`subject`) `t`
 where (`rankMin` = ? or `rankMax` = ?))

Basically the same thing we did before, but with a few more parens, plus some userId filtering I left off for clarity.

I left off the tags queries, and the unused subjects/tags queries, but if you’re curious what they look like, the code is all here and the final union looks like this:

const dataQuery = union(
  db
    .select({
      label: sql<string>`'All books'`,
      count: sql<number>`COUNT(*)`,
      id: sql<number>`0`,
    })
    .from(books)
    .where(eq(books.userId, userId)),
  subjectsQuery(),
  unusedSubjectsQuery(),
  tagsQuery(),
  unusedTagsQuery()
);

Just more function calls thrown into the union.

Flexibility

Some of you might wince seeing that many large queries all union‘d together. Those queries are actually run one after the other on the MySQL box. But, for this project it’s a small amount of data and there’s not multiple round trips over the network to do it. Our MySQL engine executes those queries one after the other.

But let’s say you decide you’re better off breaking that union apart, and sending N queries, with each piece, and putting it all together in application code. These queries are already separate function calls. It would be fairly easy to remove those calls from the union, and instead invoke them in isolation (and then modify your application code).

This kind of flexibility is what I love the most about Drizzle. Refactoring large, complex stored procedure has always been a pain with SQL. When you code it through Drizzle, it becomes much more like refactoring a typed programming language like TypeScript or C#.

Debugging Queries

Before we wrap up, let’s take a look at how easily Drizzle let’s you debug your queries. Let’s say the query from earlier didn’t return what we expected, and we want to see the actual SQL being run. We can do that by removing the await from the query, and then calling toSQL on the result.

import { and, desc, eq, like, lt, or } from "drizzle-orm";

const result = db
  .select()
  .from(books)
  .where(
    and(
      eq(books.userId, userId),
      or(lt(books.pages, 150), like(books.authors, "%Stephen Jay Gould%"))
    )
  )
  .orderBy(desc(books.id))
  .limit(10);

console.log(result.toSQL());

This displays the following:

{
  "sql": "select `id`, `userId`, `authors`, `title`, `isbn`, `pages` from `books` where (`books`.`userId` = ? and (`books`.`pages` < ? or `books`.`authors` like ?)) order by `books`.`id` desc limit ?",
  "params": ["123", 150, "%Stephen Jay Gould%", 10]
}

result.toSQL() returned an object, with a sql field with our query, and a params field with the parameters. As any ORM would, Drizzle parameterized our query, so fields with invalid characters wouldn’t break anything. You can now run this query directly against your database to see what went wrong.

Wrapping Up

I hope you’ve enjoyed this introduction to Drizzle. If you’re not afraid of a little SQL, it can make your life a lot easier.

]]>
https://frontendmasters.com/blog/introducing-drizzle/feed/ 0 2708
Prefetching When Server Loading Won’t Do https://frontendmasters.com/blog/prefetching-when-server-loading-wont-do/ https://frontendmasters.com/blog/prefetching-when-server-loading-wont-do/#respond Wed, 15 May 2024 23:26:46 +0000 https://frontendmasters.com/blog/?p=2200 This is a post about a boring* topic: loading data.

(* Just kidding it will be amazing and engaging.)

Not how to load data, but instead we’ll take a step back, and look at where to load data. Not in any particular framework, either, this is going to be more broadly about data loading in different web application architectures, and paricularly how that impacts performance.

We’ll start with client-rendered sites and talk about some of the negative performance characteristics they may have. Then we’ll move on to server-rendered apps, and then to the lesser-known out-of-order streaming model. To wrap up, we’ll talk about a surprisingly old, rarely talked about way to effectively load slow data in a server-rendered application. Let’s get started!

Client Rendering

Application metaframeworks like Next and SvelteKit have become incredibly popular. In addition to offering developer conveniences like file system-based routing and scaffolding of API endoints, they also, more importantly, allow you to server render your application.

Why is server rendering so important? Let’s take a look at how the world looks with the opposite: client-rendered web applications, commonly referred to as “single page applications” or SPAs. Let’s start with a simplified diagram of what a typical request for a page looks like in an SPA.

The browser makes a request to your site. Let’s call it yoursite.io. With an SPA, it usually sends down a single, mostly empty HTML page, which has whatever script and style tags needed to run the site. This shell of a page might display your company logo, your static header, your copyright message in the footer, etc. But mostly it exists to load and run JavaScript, which will build the “real” site.

This is why these sites are called “single page” applications. There’s a single web page for the whole app, which runs code on the client to detect URL changes, and request and render whatever new UI is needed.

Back to our diagram. The inital web page was sent back from the web server as HTML. Now what? The browser will parse that HTML and find script tags. These script tags contain our application code, our JavaScript framework, etc. The browser will send requests back to the web server to load these scripts. Once the browser gets them back, it’ll parse, and execute them, and in so doing, begin executing your application code.

At this point whatever client-side router you’re using (i.e. react-routerTanstack Router, etc) will render your current page.

But there’s no data yet!

So you’re probably displaying loading spinners or skeleton screens or the like. To get the data, your client-side code will now make yet another request to your server to fetch whatever data are needed, so you can display your real, finished page to your user. This could be via a plain old fetchreact-query, or whatever. Those details won’t concern us here.

SSR To The Rescue

There is a pretty clear solution here. The server already has has the URL of the request, so instead of only returning that shell page, it could (should) request the data as well, get the page all ready to go, and send down the complete page.

Somehow.

This is how the web always worked with tools like PHP or asp.net. But when your app is written with a client-side JavaScript framework like React or Svelte, it’s surprisingly tricky. These frameworks all have API’s for stringifying a component tree into HTML on the server, so that markup can be sent down to the browser. But if a component in the middle of that component tree needs data, how do you load it on the server, and then somehow inject it where it’s needed? And then have the client acknowledge that data, and not re-request it. And of course, once you solve these problems and render your component tree, with data, on the server, you still need to re-render this component tree on the client, so your client-side code, like event handlers and such, start working.

This act of re-rendering the app client side is called hydration. Once it’s happened, we say that our app is interactive. Getting these things right is one of the main benefits modern application meta-frameworks like Next and SvelteKit provide.

Let’s take a look at what our request looks like in this server-rendered setup:

That’s great. The user sees the full page much, much sooner. Sure, it’s not interactive yet, but if you’re not shipping down obscene amounts of JavaScript, there’s a really good chance hydration will finish before the user can manage to click on any buttons.

We won’t get into all this, but Google themselves tell you this is much better for SEO as well.

So, what’s the catch? Well, what if our data are slow to load. Maybe our database is busy. Maybe it’s a huge request. Maybe there is a network hiccup. Or maybe you just depend on slow services you can’t control. It’s not rare.

This might be worse than the SPA we started with. Even though we needed multiple round trips to the server to get data, at least we were displaying a shell of a page quickly. Here, the initial request to the server will just hang and wait as long as needed for that data to load on the server, before sending down the full page. To the user, their browser (and your page) could appear unresponsive, and they might just give up and go back.

Out of Order Streaming

What if we could have the best of all worlds. What if we could server render, like we saw. But if some data are slow to load, we ship the rest of the page, with the data that we have, and let the server push down the remaining data, when ready. This is called streaming, or more precisely, out-of-order streaming (streaming, without the out-of-order part, is a separate, much more limited thing which we won’t cover here).

Let’s take a hypothetical example where the data abd, and data xyz are slow to load.

With out-of-order streaming we can load the to-do data load on the server, and send the page with just that data down to the user, immediately. The other two pieces of data have not loaded, yet, so our UI will display some manner of loading indicator. When the next piece of data is ready, the server pushes it down:

What’s the catch?

So does this solve all of our problems? Yes, but… only if the framework you’re using supports it. To stream with Next.js app directory you’ll use Suspense components with RSCWith SvelteKit you just return a promise from your loader. Remix supports this too, with an API that’s in the process of changing, so check their docs. SolidStart will also support this, but as of writing that entire project is still in beta, so check its docs when it comes out.

Some frameworks do not support this, like Astro and Next if you’re using the legacy pages directory.

What if we’re using those projects, and we have some dependencies on data which are slow to load? Are we stuck rendering this data in client code, after hydration?

Prefetching to the rescue

The web platform has a feature called prefetching. This lets us add a <link> tag to the <head> section of our HTML page, with a rel="prefetch" attribute, and an href attribute of the URL we want to prefetch. We can put service endpoint calls here, so long as they use the GET verb. If we need to pre-fetch data from an endpoint that uses POST, you’ll need to proxy it through an endpoint that uses GET. It’s worth noting that you can also prefetch with an HTTP header if that’s more convenient; see this post for more information.

When we do this, our page will start pre-fetching our resources as soon as the browser parses the link tag. Since it’s in the <head>, that means it’ll start pre-fetching at the same time our scripts and stylesheets are requested. So we no longer need to wait until our script tags load, parse, and hydrate our app. Now the data we need will start pre-fetching immediately. When hydration does complete, and our application code requests those same endpoints, the browser will be smart enough to serve that data from the prefetch cache.

Let’s see prefetching in action

To see pre-fetching in action, we’ll use Astro. Astro is a wonderful web framework that doesn’t get nearly enough attention. One of the very few things it can’t do is out-of-order streaming (for now). But let’s see how we can improve life with pre-fetching.

The repo for the code I’ll be showing is here. It’s not deployed anywhere, for fear of this blog posting getting popular, and me getting a big bill from Vercel. But the project has no external dependencies, so you can clone, install, and run locally. You could also deploy this to Vercel yourself if you really want to see it in action.

I whipped up a very basic, very ugly web page that hits some endpoints to pull down a hypothetical list of books, and some metadata about the library, which renders the books once ready. It looks like this:

The endpoints return static data, which is why there’s no external dependencies. I added a manual delay of 700ms to these endpoints (sometimes you have slow services and there’s nothing you can do about it), and I also installed and imported some large JavaScript libraries (d3, framer-motion, and recharts) to make sure hydration would take a moment or two, like with most production applications. And since these endpoints are slow, they’re a poor candidate for server fetching.

So let’s request them client-side, see the performance of the page, and then add pre-fetching to see how that improves things.

The client-side fetching looks like this:

useEffect(() => {
  fetch("/api/books")
    .then((resp) => resp.json())
    .then((books) => {
      setBooks(books);
    });

  fetch("/api/books-count")
    .then((resp) => resp.json())
    .then((booksCountResp) => {
      setCount(booksCountResp.count);
    });
}, []);

Nothing fancy. Nothing particularly resilient here. Not even any error handling. But perfect for our purposes.

Network diagram without pre-fetching

Running this project, deployed to Vercel, my network diagram looks like this:

Notice all of the script and style resources, which need to be requested and processed before our client-side fetches start (on the last two lines).

Adding pre-fetching

I’ve added a second page to this project, called with-prefetch, which is the same as the index page. Except now, let’s see how we can add some <link> tags to request these resources sooner.

First, in the root layout, let’s add this in the head section

<slot name="head"></slot>

this gives us the ability to (but does not require us to) add content to our HTML document’s <head>. This is exactly what we need. Now we can make a PrefetchBooks React component:

import type { FC } from "react";

export const PrefetchBooks: FC<{}> = (props) => {
  return (
    <>
      <link rel="prefetch" href="/api/books" as="fetch" />
      <link rel="prefetch" href="/api/books-count" as="fetch" />
    </>
  );
};

Then render it in our prefetching page, like so

<PrefetchBooks slot="head" />

Note the slot attribute on the React component, which tells Astro (not React) where to put this content.

With that, if we run that page, we’ll see our link tags in the head

Now let’s look at our updated network diagram:

Notice our endpoint calls now start immediately, on lines 3 and 4. Then later, in the last two lines, we see the real fetches being executed, at which point they just latch onto the prefetch calls already in flight.

Let’s put some hard numbers on this. When I ran a webpagetest mobile Lighthouse analysis on the version of this page without the pre-fetch, I got the following.

Note the LCP (Largest Contentful Paint) value. That’s essentially telling us when the page looks finished to a user. Remember, the Lighthouse test simulates your site in the slowest mobile device imagineable, which is why it’s 4.6 seconds.

When I re-run the same test on the pre-fetched version, things improved about a second

Definitely much better, but still not good; but it never will be until you can get your backend fast. But with some intelligent, targetted pre-fetching, you can at least improve things.

Parting thoughts

Hopefully all of your back-end data requirements will be forever fast in your developer journeys. But when they’re not, prefetching resources is a useful tool to keep in your toolbelt.

]]>
https://frontendmasters.com/blog/prefetching-when-server-loading-wont-do/feed/ 0 2200