Hierarchical Namespace (HNS)
To train, checkpoint, and serve AI models at peak efficiency, Google Cloud Storage (GCS) offers Hierarchical Namespace (HNS).
gcsfs provides full support for all data and metadata operations on HNS buckets.
What is a Hierarchical Namespace (HNS)?
Historically, GCS buckets have utilized a flat namespace. In a flat
namespace, directories do not exist as distinct physical entities; they are
simulated by 0-byte objects ending in a slash (/) or by filtering object
prefixes during list operations.
A Hierarchical Namespace (HNS) introduces true, logical directories as first-class resources to GCS.
Under the Hood: The ExtendedFileSystem
gcsfs utilizes the ExtendedFileSystem class under the hood (implemented in gcsfs/extended_gcsfs.py).
Importantly, ExtendedFileSystem is designed to be fully backward-compatible. Before executing directory operations, it automatically identifies the underlying bucket type. If it detects a standard flat-namespace bucket, it routes the request back to standard object-level operations, ensuring your existing buckets continue to work without issue.
The fundamental architectural shift is that ExtendedFileSystem actively routes directory-level operations to the GCS Folders grpc API instead of relying solely on the Objects API.
Operation |
Flat Namespace (Standard |
HNS Namespace ( |
|---|---|---|
``mkdir`` |
Only used for creating buckets, since GCS Flat namespace doesn’t have real directories. |
Calls the native GCS Folders API, creating physical GCS Folder resource instead of simulating with 0 byte object or object prefix. |
``rmdir`` |
Primarily used to delete buckets, as directories do not exist as distinct physical entities. |
Used to delete empty folders natively via the GCS Folders API, in addition to deleting buckets. |
``rm`` |
Paginates through and individually issues delete requests for every object matching the prefix. |
Deletes the folder resource and its contents via different delete requests corresponding to folder or file. |
``rename`` / ``mv`` |
Issues a |
Triggers a single native metadata-only rename on the folder. Atomic and more performant, |
``info`` |
Infers directory existence by checking for child objects, returning mocked 0-byte metadata. |
Uses |
Important Differences to Keep in Mind
While gcsfs aims to abstract the differences via the fsspec API, you should be aware of standard HNS limitations imposed by the Google Cloud Storage API:
Implicit directories: In standard GCS, you can create an object
a/b/c.txtwithout the directoriesa/ora/b/physically existing. In HNS, the parent folder resources must exist (or be created) before the object can be written.gcsfshandles parent folder creation natively under the hood.``mkdir`` behavior: Previously, in a flat namespace, calling
mkdiron a path could only ensure the underlying bucket exists. With HNS enabled, callingmkdirwill create an actual folder resource in GCS. Furthermore, if you want to create nested folders (eg: bucket/a/b/c/d), passcreate_parents=True, it will physically create all intermediate folder resources along the specified path.No mixing or toggling: You cannot toggle HNS on an existing flat-namespace bucket. You must create a new HNS bucket and migrate your data.
Object naming: Object names in HNS cannot end with a slash (
/) unless without the creation of physical folder resources.Non-Recursive Wildcard Deletions: When using wildcard deletions without recursion (e.g.,
gcs.rm("bucket/dir/*", recursive=False)), if the pattern matches a non-empty directory, HNS buckets will raise an error (typically surfaced as anOSErrordue to a failed precondition). In contrast, standard flat-namespace buckets silently ignore non-empty directories under the same circumstances.Rename Operation Benchmarks
The following benchmarks show the time taken (in seconds) to rename a directory containing a large number of files (spread across 256 folders and 8 levels) in a standard Regional bucket versus an HNS bucket (can be replicated using gcsfs/tests/perf/microbenchmarks/rename):
File Count |
Standard Regional (seconds) |
HNS (seconds) |
|---|---|---|
65K Files |
75.69 |
15.4 |
100K Files |
170.6 |
23.2 |
For more details on managing these buckets, refer to the official documentation for Hierarchical Namespace.
Disabling HNS Support
You can disable these features by explicitly setting an environment variable of the same name.
Code Example
export GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT=false
Note: The choice of which filesystem class to use is made at import time based on the GCSFS_EXPERIMENTAL_ZB_HNS_SUPPORT environment variable, and cannot be controlled via constructor arguments passed to GCSFileSystem (but you can still import each class explicitly, if needed).