> rwasm::file_packager("./input", out_dir = ".", out_name = "output")
Mounting Filesystem Data
The virtual filesystem
The Emscripten filesystem API provides a Unix-like virtual filesystem for the WebAssembly (Wasm) R process running in webR. This virtual filesystem has the ability to mount filesystem images or host directories so that the associated file and directory data is accessible to the Wasm R process.
Mounting images and directories in this way gives the Wasm R process access to arbitrary external data, potentially including datasets, scripts, or R packages pre-compiled for WebAssembly.
Emscripten’s API allows for several types of virtual filesystem, depending on the execution environment. The following filesystems are available for use with webR:
Filesystem | Description | Web Browser | Node.js |
---|---|---|---|
WORKERFS |
Mount Emscripten filesystem images. | ✅ | ✅ |
NODEFS |
Mount existing host directories. | ❌ | ✅ |
IDBFS |
Browser-based persistent storage using the IndexedDB API. | ✅1 | ❌ |
Filesystem images
Filesystem images are pre-prepared files containing a collection of files and associated metadata. The WORKERFS
filesystem type can be used to efficiently make the contents of a filesystem image available to the WebAssembly R process.
Emscripten’s file_packager
tool
The file_packager
tool, provided by Emscripten, takes in a directory structure as input and produces a webR compatible filesystem image as output. The file_packager
tool may be invoked from R using the rwasm R package:
It can also be invoked directly using its CLI2, if you prefer:
$ file_packager output.data --preload ./input@/ \
--separate-metadata --js-output=output.js
In the above examples, the files in the directory ./input
are packaged and an output filesystem image is created3 consisting of a data file, output.data
, and a metadata file, output.js.metadata
.
To prepare for mounting the filesystem image with webR, ensure that both files have the same basename (in this example, output
). The resulting URLs or relative paths for the two files should differ only by the file extension.
Compression
Filesystem image .data
files may optionally be gzip
compressed prior to deployment. The file extension for compressed filesystem images should be .data.gz
, and compression should be indicated by setting the property gzip: true
on the metadata JSON stored in the .js.metadata
file.
Process archives with the rwasm
package
Archives in .tar
format, optionally gzip compressed as .tar.gz
or .tgz
files, can also be used as filesystem images by pre-processing the .tar
archive using the rwasm R package. The rwasm::add_tar_index()
function reads the archive contents and appends the required filesystem metadata to the end of the .tar
archive data in a way that is understood by webR.
> rwasm::add_tar_index("./path/to/archive.tar.gz")
Once processed by the rwasm
R package, the archive can be deployed and used directly as a filesystem image.
Mounting a filesystem image
When running in a web browser, the webr::mount()
function downloads and mounts a filesystem image from a URL source, using the WORKERFS
filesystem type.
::mount(
webrmountpoint = "/data",
source = "https://example.com/output.data"
)
Filesystem images should be deployed to static file hosting4 and the resulting URL provided as the source argument. The image will be mounted in the virtual filesystem under the path given by the mountpoint
argument. If the mountpoint
directory does not exist, it will be created prior to mounting.
When running under Node.js, the source may also be provided as a relative path to a filesystem image on disk.
JavaScript API
WebR’s JavaScript API includes the WebR.FS.mount()
function, a thin wrapper around Emscripten’s own FS.mount()
. The JavaScript API provides more flexibility but requires a little more set up, including creating the mountpoint
directory if it does not already exist.
The filesystem type should be provided as a string
, with the options
argument of type FSMountOptions
. The filesystem image data should be provided either as a JavaScript Blob
object or an ArrayBuffer
-like object, and the metadata provided as a JavaScript object that has been deserialised from the underlying JSON content.
// Create mountpoint
await webR.FS.mkdir('/data')
// Download image data
const data = await fetch('https://example.com/output.data');
const metadata = await fetch('https://example.com/output.js.metadata');
// Mount image data
const options = {
packages: [{
blob: await data.blob(),
metadata: await metadata.json(),
,
}]
}await webR.FS.mount("WORKERFS", options, '/data');
import { FSMountOptions } from 'webr';
// Create mountpoint
await webR.FS.mkdir('/data')
// Download image data
const data = await fetch('https://example.com/output.data');
const metadata = await fetch('https://example.com/output.js.metadata');
// Mount image data
const options: FSMountOptions = {
: [{
packages: await data.blob(),
blob: await metadata.json(),
metadata,
}]
}await webR.FS.mount("WORKERFS", options, '/data');
See the Emscripten FS.mount()
documentation for further details about the structure of the options
argument.
Mount an existing host directory
The NODEFS
filesystem type maps directories that exist on the host machine so that they are accessible in the WebAssembly process.
NODEFS
is only available when running webR under Node.js.
To mount the directory ./extra
on the virtual filesystem at /data
, use either the JavaScript or R mount API with the filesystem type set to "NODEFS"
.
await webR.FS.mkdir('/data')
await webR.FS.mount('NODEFS', { root: './extra' }, '/data');
::mount(
webrmountpoint = "/data",
source = "./extra",
type = "NODEFS"
)
IndexedDB Filesystem Storage
When using webR in a web browser, an IndexedDB-based persistent storage space can be mounted using the IDBFS
filesystem type.
Due to the way webR blocks for input in the worker thread, the IDBFS
filesystem type does not work when using the SharedArrayBuffer
communication channel. WebR must be configured to use the PostMessage
communication channel to use IDBFS
persistent storage.
Mounting
First, create a directory to contain the IndexedDB filesystem, then use either the JavaScript or R mount API with type "IDBFS"
.
await webR.FS.mkdir('/data');
await webR.FS.mount('IDBFS', {}, '/data');
await webR.FS.syncfs(true);
dir.create("/data")
::mount(mountpoint = "/data", type = "IDBFS")
webr::syncfs(TRUE) webr
After mounting the filesystem using mount()
, the syncfs()
function should been invoked with its populate
argument set to true
. This extra step is required to initialise the virtual filesystem with any previously existing data files in the browser’s IndexedDB storage. Without it, the filesystem will always be initially mounted as an empty directory.
For more information, see the Emscripten FS API IDBFS
and FS.syncfs()
documentation.
Persisting the filesystem to IndexedDB
The syncfs()
function should be invoked with its populate
argument set to false
to persist the current state of the filesystem to the browser’s IndexedDB storage.
await webR.FS.syncfs(false);
::syncfs(FALSE) webr
After writing to the virtual filesystem you should be sure to invoke syncfs(false)
before the web page containing webR is closed to ensure that the filesystem data is flushed and written to the IndexedDB-based persistent storage.
Operations performed using IndexedDB are done asynchronously. If you are mounting IDBFS
filesystems and accessing data non-interactively you should use the JavaScript API and be sure to wait for the Promise
returned by webR.FS.syncfs(false)
to resolve before continuing, for example by using the await
keyword.
In a future version of webR the webr::syncfs()
function will similarly return a Promise-like object.
Web storage caveats
Filesystem data stored in an IndexedDB database can only be accessed within the current origin, loosely defined as the current web page’s host domain and port.
The way in which web browsers decide how much storage space to allocate for data and what to remove when limits are reached differs between browsers and is not always simple to calculate. Be aware of browser storage quotas and eviction criteria and note that data stored in an IDBFS
filesystem type is stored only on a “best-effort” basis. It can be removed by the browser at any time, autonomously or by the user interacting through the browser’s UI.
In private browsing mode, for example, stored data is usually deleted when the private session ends.
Footnotes
Using the
PostMessage
communication channel only.↩︎See the
file_packager
Emscripten documentation for details. ↩︎When using the
file_packager
CLI, a third file namedoutput.js
will also be created. If you only plan to mount the image using webR, this file may be discarded.↩︎e.g. GitHub Pages, Netlify, AWS S3, etc.↩︎