Mounting Filesystem Data

The virtual filesystem

The Emscripten filesystem API provides a Unix-like virtual filesystem for the WebAssembly (Wasm) R process running in webR. This virtual filesystem has the ability to mount filesystem images or host directories so that the associated file and directory data is accessible to the Wasm R process.

Mounting images and directories in this way gives the Wasm R process access to arbitrary external data, potentially including datasets, scripts, or R packages pre-compiled for WebAssembly.

Emscripten’s API allows for several types of virtual filesystem, depending on the execution environment. The following filesystems are available for use with webR:

Filesystem Description Web Browser Node.js
WORKERFS Mount Emscripten filesystem images. 1
NODEFS Mount existing host directories.
IDBFS Browser-based persistent storage using the IndexedDB API. 2

Emscripten filesystem images

Emscripten filesystem images can be mounted using the WORKERFS filesystem type.

The file_packager tool, provided by Emscripten, takes in a directory structure as input and produces webR compatible filesystem images as output. The file_packager tool may be invoked from R using the rwasm R package:

> rwasm::file_packager("./input", out_dir = ".", out_name = "output")

It can also be invoked directly using its CLI3, if you prefer:

$ file_packager output.data --preload ./input@/ \
    --separate-metadata --js-output=output.js

In the above examples, the files in the directory ./input are packaged and an output filesystem image is created4 consisting of a data file, output.data, and a metadata file, output.js.metadata.

To prepare for mounting the filesystem image with webR, ensure that both files have the same basename (in this example, output) and are deployed to static file hosting5. The resulting URLs for the two files should differ only by the file extension.

Mount a filesystem image from URL

By default, the webr::mount() function downloads and mounts a filesystem image from a URL source, using the WORKERFS filesystem type.

webr::mount(
  mountpoint = "/data",
  source = "https://example.com/output.data"
)

A URL for the filesystem image .data file should be provided as the source argument, and the image will be mounted in the virtual filesystem under the path given by the mountpoint argument. If the mountpoint directory does not exist, it will be created prior to mounting.

Compression

Filesystem image .data files may optionally be gzip compressed prior to deployment. The file extension for compressed filesystem images should be .data.gz, and compression should be indicated by setting the property gzip: true on the metadata JSON stored in the .js.metadata file.

JavaScript API

WebR’s JavaScript API includes the WebR.FS.mount() function, a thin wrapper around Emscripten’s own FS.mount(). The JavaScript API provides more flexibility but requires a little more set up, including creating the mountpoint directory if it does not already exist.

The filesystem type should be provided as a string, with the options argument a JavaScript object of type FSMountOptions. The filesystem image data should be provided as a JavaScript Blob and the metadata as a JavaScript object deserialised from the underlying JSON content.

// Create mountpoint
await webR.FS.mkdir('/data')

// Download image data
const data = await fetch('https://example.com/output.data');
const metadata = await fetch('https://example.com/output.js.metadata');

// Mount image data
const options = {
  packages: [{
    blob: await data.blob(),
    metadata: await metadata.json(),
  }],
}
await webR.FS.mount("WORKERFS", options, '/data');
import { FSMountOptions } from 'webr';

// Create mountpoint
await webR.FS.mkdir('/data')

// Download image data
const data = await fetch('https://example.com/output.data');
const metadata = await fetch('https://example.com/output.js.metadata');

// Mount image data
const options: FSMountOptions = {
  packages: [{
    blob: await data.blob(),
    metadata: await metadata.json(),
  }],
}
await webR.FS.mount("WORKERFS", options, '/data');

See the Emscripten FS.mount() documentation for further details about the structure of the options argument.

Mount an existing host directory

The NODEFS filesystem type maps directories that exist on the host machine so that they are accessible in the WebAssembly process.

Warning

NODEFS is only available when running webR under Node.js.

To mount the directory ./extra on the virtual filesystem at /data, use either the JavaScript or R mount API with the filesystem type set to "NODEFS".

await webR.FS.mkdir('/data')
await webR.FS.mount('NODEFS', { root: './extra' }, '/data');
webr::mount(
  mountpoint = "/data",
  source = "./extra",
  type = "NODEFS"
)

IndexedDB Filesystem Storage

When using webR in a web browser, an IndexedDB-based persistent storage space can be mounted using the IDBFS filesystem type.

Warning

Due to the way webR blocks for input in the worker thread, the IDBFS filesystem type does not work when using the SharedArrayBuffer communication channel. WebR must be configured to use the PostMessage communication channel to use IDBFS persistent storage.

Mounting

First, create a directory to contain the IndexedDB filesystem, then use either the JavaScript or R mount API with type "IDBFS".

await webR.FS.mkdir('/data');
await webR.FS.mount('IDBFS', {}, '/data');
await webR.FS.syncfs(true);
dir.create("/data")
webr::mount(mountpoint = "/data", type = "IDBFS")
webr::syncfs(TRUE)

After mounting the filesystem using mount(), the syncfs() function should been invoked with its populate argument set to true. This extra step is required to initialise the virtual filesystem with any previously existing data files in the browser’s IndexedDB storage. Without it, the filesystem will always be initially mounted as an empty directory.

For more information, see the Emscripten FS API IDBFS and FS.syncfs() documentation.

Persisting the filesystem to IndexedDB

The syncfs() function should be invoked with its populate argument set to false to persist the current state of the filesystem to the browser’s IndexedDB storage.

await webR.FS.syncfs(false);
webr::syncfs(FALSE)

After writing to the virtual filesystem you should be sure to invoke syncfs(false) before the web page containing webR is closed to ensure that the filesystem data is flushed and written to the IndexedDB-based persistent storage.

Warning

Operations performed using IndexedDB are done asynchronously. If you are mounting IDBFS filesystems and accessing data non-interactively you should use the JavaScript API and be sure to wait for the Promise returned by webR.FS.syncfs(false) to resolve before continuing, for example by using the await keyword.

In a future version of webR the webr::syncfs() function will similarly return a Promise-like object.

Web storage caveats

Filesystem data stored in an IndexedDB database can only be accessed within the current origin, loosely defined as the current web page’s host domain and port.

The way in which web browsers decide how much storage space to allocate for data and what to remove when limits are reached differs between browsers and is not always simple to calculate. Be aware of browser storage quotas and eviction criteria and note that data stored in an IDBFS filesystem type is stored only on a “best-effort” basis. It can be removed by the browser at any time, autonomously or by the user interacting through the browser’s UI.

In private browsing mode, for example, stored data is usually deleted when the private session ends.

Footnotes

  1. Be aware of the current GitHub issue #328.↩︎

  2. Using the PostMessage communication channel only.↩︎

  3. See the file_packager Emscripten documentation for details. ↩︎

  4. When using the file_packager CLI, a third file named output.js will also be created. If you only plan to mount the image using webR, this file may be discarded.↩︎

  5. e.g. GitHub Pages, Netlify, AWS S3, etc.↩︎