HugBucket is built in three layers. A protocol layer accepts connections from S3 and FTP clients, a backend interface defines the contract between protocols and storage, and a provider layer implements that contract against Hugging Face Hub and Xet CAS.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/sachnun/hugbucket/llms.txt
Use this file to discover all available pages before exploring further.
Three-layer architecture
Protocol layer
S3 (aiohttp-based) and FTP (pyftpdlib-based) adapters parse client requests and translate them into backend calls. Neither adapter knows anything about HF or Xet.
Backend interface
StorageBackend is an abstract class in hugbucket.core.backend that defines 15 async methods. Protocol adapters depend only on this contract.Provider layer
HFStorageBackend (also aliased as Bridge) in hugbucket.bridge implements StorageBackend by orchestrating calls to HubClient and CASClient.Data models
Two dataclasses inhugbucket.core.models are shared across every layer:
hugbucket/core/models.py
BucketInfo represents a bucket returned by list_buckets or head_bucket. BucketFile represents a single file or directory entry, carrying the xet_hash that is the key to content-addressable retrieval.
StorageBackend abstract interface
Every protocol adapter calls only the methods defined on StorageBackend. This isolation means the S3 and FTP layers can be tested with a mock backend and swapped independently of the HF implementation.
hugbucket/core/backend.py
Namespace resolution
At startup, the server callsresolve_namespace(), which issues a single GET /api/whoami-v2 request to HF Hub using the configured HF_TOKEN. The returned username is stored in Config.hf_namespace and prepended to every bucket operation as {hf_namespace}/{bucket_name}.
hf_namespace can also be set explicitly in config to target an organization rather than a personal account.Caching layers
HugBucket maintains four in-process caches to minimise round-trips to HF Hub and Xet CAS:| Cache | Type | Capacity | TTL |
|---|---|---|---|
| Xorb cache | LRU (bounded by memory) | 512 MiB | Eviction only |
| File info cache | LRU (bounded by entry count) | 256 entries | 30 seconds |
| Reconstruction cache | LRU (bounded by entry count) | 1 024 entries | 5 minutes |
| Read token cache | Dict (per bucket) | Unbounded | Expires 60 s before token expiry |
Config fields:
hugbucket/config.py
The file info cache has a deliberately short TTL (30 seconds) to keep metadata reasonably consistent when multiple clients are writing to the same bucket. After any mutation (
put_object, delete_object, copy_object) the affected key is immediately evicted via _invalidate_file_info.