Mounting Filesystem Data

The virtual filesystem

The Emscripten filesystem API provides a Unix-like virtual filesystem for the WebAssembly (Wasm) R process running in webR. This virtual filesystem has the ability to mount filesystem images or host directories so that the associated file and directory data is accessible to the Wasm R process.

Mounting images and directories in this way gives the Wasm R process access to arbitrary external data, potentially including datasets, scripts, or R packages pre-compiled for WebAssembly.

Emscripten’s API allows for several types of virtual filesystem, depending on the execution environment. The following filesystems are available for use with webR:

Filesystem	Description	Web Browser	Node.js
`WORKERFS`	Mount Emscripten filesystem images.	✅	✅
`NODEFS`	Mount existing host directories.	❌	✅
`IDBFS`	Browser-based persistent storage using the IndexedDB API.	✅¹	❌

Filesystem images

Filesystem images are pre-prepared files containing a collection of files and associated metadata. The WORKERFS filesystem type can be used to efficiently make the contents of a filesystem image available to the WebAssembly R process.

Emscripten’s `file_packager` tool

The file_packager tool, provided by Emscripten, takes in a directory structure as input and produces a webR compatible filesystem image as output. The file_packager tool may be invoked from R using the rwasm R package:

> rwasm::file_packager("./input", out_dir = ".", out_name = "output")

It can also be invoked directly using its CLI², if you prefer:

$ file_packager output.data --preload ./input@/ \
    --separate-metadata --js-output=output.js

In the above examples, the files in the directory ./input are packaged and an output filesystem image is created³ consisting of a data file, output.data, and a metadata file, output.js.metadata.

To prepare for mounting the filesystem image with webR, ensure that both files have the same basename (in this example, output). The resulting URLs or relative paths for the two files should differ only by the file extension.

Compression

Filesystem image .data files may optionally be gzip compressed prior to deployment. The file extension for compressed filesystem images should be .data.gz, and compression should be indicated by setting the property gzip: true on the metadata JSON stored in the .js.metadata file.

Process archives with the `rwasm` package

Archives in .tar format, optionally gzip compressed as .tar.gz or .tgz files, can also be used as filesystem images by pre-processing the .tar archive using the rwasm R package. The rwasm::add_tar_index() function reads the archive contents and appends the required filesystem metadata to the end of the .tar archive data in a way that is understood by webR.

> rwasm::add_tar_index("./path/to/archive.tar.gz")

Once processed by the rwasm R package, the archive can be deployed and used directly as a filesystem image.

Mounting a filesystem image

When running in a web browser, the webr::mount() function downloads and mounts a filesystem image from a URL source, using the WORKERFS filesystem type.

webr::mount(
  mountpoint = "/data",
  source = "https://example.com/output.data"
)

Filesystem images should be deployed to static file hosting⁴ and the resulting URL provided as the source argument. The image will be mounted in the virtual filesystem under the path given by the mountpoint argument. If the mountpoint directory does not exist, it will be created prior to mounting.

When running under Node.js, the source may also be provided as a relative path to a filesystem image on disk.

JavaScript API

WebR’s JavaScript API includes the WebR.FS.mount() function, a thin wrapper around Emscripten’s own FS.mount(). The JavaScript API provides more flexibility but requires a little more set up, including creating the mountpoint directory if it does not already exist.

The filesystem type should be provided as a string, with the options argument of type FSMountOptions. The filesystem image data should be provided either as a JavaScript Blob object or an ArrayBuffer-like object, and the metadata provided as a JavaScript object that has been deserialised from the underlying JSON content.

JavaScript
TypeScript

// Create mountpoint
await webR.FS.mkdir('/data')

// Download image data
const data = await fetch('https://example.com/output.data');
const metadata = await fetch('https://example.com/output.js.metadata');

// Mount image data
const options = {
  packages: [{
    blob: await data.blob(),
    metadata: await metadata.json(),
  }],
}
await webR.FS.mount("WORKERFS", options, '/data');

import { FSMountOptions } from 'webr';

// Create mountpoint
await webR.FS.mkdir('/data')

// Download image data
const data = await fetch('https://example.com/output.data');
const metadata = await fetch('https://example.com/output.js.metadata');

// Mount image data
const options: FSMountOptions = {
  packages: [{
    blob: await data.blob(),
    metadata: await metadata.json(),
  }],
}
await webR.FS.mount("WORKERFS", options, '/data');

See the Emscripten FS.mount() documentation for further details about the structure of the options argument.

Mount an existing host directory

The NODEFS filesystem type maps directories that exist on the host machine so that they are accessible in the WebAssembly process.

Warning

NODEFS is only available when running webR under Node.js.

To mount the directory ./extra on the virtual filesystem at /data, use either the JavaScript or R mount API with the filesystem type set to "NODEFS".

JavaScript
R

await webR.FS.mkdir('/data')
await webR.FS.mount('NODEFS', { root: './extra' }, '/data');

webr::mount(
  mountpoint = "/data",
  source = "./extra",
  type = "NODEFS"
)

IndexedDB Filesystem Storage

When using webR in a web browser, an IndexedDB-based persistent storage space can be mounted using the IDBFS filesystem type.

Warning

Due to the way webR blocks for input in the worker thread, the IDBFS filesystem type does not work when using the SharedArrayBuffer communication channel. WebR must be configured to use the PostMessage communication channel to use IDBFS persistent storage.

Mounting

First, create a directory to contain the IndexedDB filesystem, then use either the JavaScript or R mount API with type "IDBFS".

JavaScript
R

await webR.FS.mkdir('/data');
await webR.FS.mount('IDBFS', {}, '/data');
await webR.FS.syncfs(true);

dir.create("/data")
webr::mount(mountpoint = "/data", type = "IDBFS")
webr::syncfs(TRUE)

After mounting the filesystem using mount(), the syncfs() function should been invoked with its populate argument set to true. This extra step is required to initialise the virtual filesystem with any previously existing data files in the browser’s IndexedDB storage. Without it, the filesystem will always be initially mounted as an empty directory.

For more information, see the Emscripten FS API IDBFS and FS.syncfs() documentation.

Persisting the filesystem to IndexedDB

The syncfs() function should be invoked with its populate argument set to false to persist the current state of the filesystem to the browser’s IndexedDB storage.

JavaScript
R

await webR.FS.syncfs(false);

webr::syncfs(FALSE)

After writing to the virtual filesystem you should be sure to invoke syncfs(false) before the web page containing webR is closed to ensure that the filesystem data is flushed and written to the IndexedDB-based persistent storage.

Warning

Operations performed using IndexedDB are done asynchronously. If you are mounting IDBFS filesystems and accessing data non-interactively you should use the JavaScript API and be sure to wait for the Promise returned by webR.FS.syncfs(false) to resolve before continuing, for example by using the await keyword.

In a future version of webR the webr::syncfs() function will similarly return a Promise-like object.

Web storage caveats

Filesystem data stored in an IndexedDB database can only be accessed within the current origin, loosely defined as the current web page’s host domain and port.

The way in which web browsers decide how much storage space to allocate for data and what to remove when limits are reached differs between browsers and is not always simple to calculate. Be aware of browser storage quotas and eviction criteria and note that data stored in an IDBFS filesystem type is stored only on a “best-effort” basis. It can be removed by the browser at any time, autonomously or by the user interacting through the browser’s UI.

In private browsing mode, for example, stored data is usually deleted when the private session ends.

Footnotes

Using the PostMessage communication channel only.↩︎
See the file_packager Emscripten documentation for details. ↩︎
When using the file_packager CLI, a third file named output.js will also be created. If you only plan to mount the image using webR, this file may be discarded.↩︎
e.g. GitHub Pages, Netlify, AWS S3, etc.↩︎

The virtual filesystem

Filesystem images

Emscripten’s file_packager tool

Compression

Process archives with the rwasm package

Mounting a filesystem image

JavaScript API

Mount an existing host directory

IndexedDB Filesystem Storage

Mounting

Persisting the filesystem to IndexedDB

Web storage caveats

Footnotes

Emscripten’s `file_packager` tool

Process archives with the `rwasm` package