Hierarchical Namespace (HNS)

To train, checkpoint, and serve AI models at peak efficiency, Google Cloud Storage (GCS) offers Hierarchical Namespace (HNS).

gcsfs provides full support for all data and metadata operations on HNS buckets.

What is a Hierarchical Namespace (HNS)?

Historically, GCS buckets have utilized a flat namespace. In a flat namespace, directories do not exist as distinct physical entities; they are simulated by 0-byte objects ending in a slash (/) or by filtering object prefixes during list operations.

A Hierarchical Namespace (HNS) introduces true, logical directories as first-class resources to GCS.

Under the Hood: The ExtendedFileSystem

gcsfs utilizes the ExtendedFileSystem class under the hood (implemented in gcsfs/extended_gcsfs.py).

Importantly, ExtendedFileSystem is designed to be fully backward-compatible. Before executing directory operations, it automatically identifies the underlying bucket type. If it detects a standard flat-namespace bucket, it routes the request back to standard object-level operations, ensuring your existing buckets continue to work without issue.

The fundamental architectural shift is that ExtendedFileSystem actively routes directory-level operations to the GCS Folders grpc API instead of relying solely on the Objects API.

Operation Semantics: Flat Namespace vs. HNS

Operation

Flat Namespace (Standard gcsfs)

HNS Namespace (ExtendedFileSystem)

``mkdir``

Only used for creating buckets, since GCS Flat namespace doesn’t have real directories.

Calls the native GCS Folders API, creating physical GCS Folder resource instead of simulating with 0 byte object or object prefix.

``rmdir``

Primarily used to delete buckets, as directories do not exist as distinct physical entities.

Used to delete empty folders natively via the GCS Folders API, in addition to deleting buckets.

``rm``

Paginates through and individually issues delete requests for every object matching the prefix.

Deletes the folder resource and its contents via different delete requests corresponding to folder or file.

``rename`` / ``mv``

Issues a Copy request for each object under the prefix, followed by Delete. Non-atomic, O(N).

Triggers a single native metadata-only rename on the folder. Atomic and more performant, O(1), helpful in Checkpointing.

``info``

Infers directory existence by checking for child objects, returning mocked 0-byte metadata.

Uses get_folder_metadata to explicitly query the Folders API, returning accurate metadata (creation time, resource IDs).

Important Differences to Keep in Mind

While gcsfs aims to abstract the differences via the fsspec API, you should be aware of standard HNS limitations imposed by the Google Cloud Storage API:

  1. Implicit directories: In standard GCS, you can create an object a/b/c.txt without the directories a/ or a/b/ physically existing. In HNS, the parent folder resources must exist (or be created) before the object can be written. gcsfs handles parent folder creation natively under the hood.

  2. ``mkdir`` behavior: Previously, in a flat namespace, calling mkdir on a path could only ensure the underlying bucket exists. With HNS enabled, calling mkdir will create an actual folder resource in GCS. Furthermore, if you want to create nested folders (eg: bucket/a/b/c/d), pass create_parents=True, it will physically create all intermediate folder resources along the specified path.

  3. No mixing or toggling: You cannot toggle HNS on an existing flat-namespace bucket. You must create a new HNS bucket and migrate your data.

  4. Object naming: Object names in HNS cannot end with a slash (/) unless without the creation of physical folder resources.

  5. Non-Recursive Wildcard Deletions: When using wildcard deletions without recursion (e.g., gcs.rm("bucket/dir/*", recursive=False)), if the pattern matches a non-empty directory, HNS buckets will raise an error (typically surfaced as an OSError due to a failed precondition). In contrast, standard flat-namespace buckets silently ignore non-empty directories under the same circumstances.

  6. Rename Operation Benchmarks

The following benchmarks show the time taken (in seconds) to rename a directory containing a large number of files (spread across 256 folders and 8 levels) in a standard Regional bucket versus an HNS bucket (can be replicated using gcsfs/tests/perf/microbenchmarks/rename):

File Count

Standard Regional (seconds)

HNS (seconds)

65K Files

75.69

15.4

100K Files

170.6

23.2

For more details on managing these buckets, refer to the official documentation for Hierarchical Namespace.

Disabling HNS Support

You can disable these features by explicitly setting an environment variable of the same name.

Code Example

export GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT=false

Note: The choice of which filesystem class to use is made at import time based on the GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT environment variable, and cannot be controlled via constructor arguments passed to GCSFileSystem (but you can still import each class explicitly, if needed).