API Reference
Base Classes
|
Tos file system. |
- class tosfs.core.TosFileSystem(*args, **kwargs)[source]
Tos file system.
It’s an implementation of AbstractFileSystem which is an abstract super-class for pythonic file-systems.
- __init__(endpoint_url: str | None = None, key: str = '', secret: str = '', region: str | None = None, session_token: str | None = None, max_retry_num: int = 20, max_connections: int = 1024, connection_timeout: int = 10, socket_timeout: int = 30, high_latency_log_threshold: int = 100, version_aware: bool = False, credentials_provider: object | None = None, default_block_size: int | None = None, default_fill_cache: bool = True, default_cache_type: str = 'readahead', multipart_staging_dirs: str = '/tmp/tmpg64hepnu', multipart_size: int = 8388608, multipart_thread_pool_size: int = 2, multipart_staging_buffer_size: int = 4096, multipart_threshold: int = 10485760, enable_crc: bool = True, enable_verify_ssl: bool = True, dns_cache_timeout: int = 0, proxy_host: str | None = None, proxy_port: int | None = None, proxy_username: str | None = None, proxy_password: str | None = None, disable_encoding_meta: bool | None = None, except100_continue_threshold: int = 65536, **kwargs: Any) None [source]
Initialise the TosFileSystem.
- Parameters:
endpoint_url (str, optional) – The endpoint URL of the TOS service.
key (str) – The access key ID(ak) to access the TOS service.
secret (str) – The secret access key(sk) to access the TOS service.
region (str, optional) – The region of the TOS service.
session_token (str, optional) – The temporary session token to access the TOS service.
max_retry_num (int, optional) – The maximum number of retries for a failed request (default is 20).
max_connections (int, optional) – The maximum number of HTTP connections that can be opened in the connection pool (default is 1024).
connection_timeout (int, optional) – The time to keep a connection open in seconds (default is 10).
socket_timeout (int, optional) – The socket read and write timeout time for a single request after a connection is successfully established, in seconds. The default is 30 seconds. Reference: https://requests.readthedocs.io/en/latest/user/quickstart/ #timeouts (default is 30).
high_latency_log_threshold (int, optional) – The threshold for logging high latency operations. When greater than 0, it represents enabling high-latency logs. The unit is KB. By default, it is 100. When the total transmission rate of a single request is lower than this value and the total request time is greater than 500 milliseconds, WARN-level logs are printed.
version_aware (bool, optional) – Whether the filesystem is version aware (default is False). Currently, not been supported, please DO NOT set to True.
credentials_provider (object, optional) – The credentials provider for the TOS service.
default_block_size (int, optional) – The default block size for reading and writing (default is None).
default_fill_cache (bool, optional) – Whether to fill the cache (default is True).
default_cache_type (str, optional) – The default cache type (default is ‘readahead’).
multipart_staging_dirs (str, optional) – The staging directories for multipart uploads (default is a temporary directory). Separate the staging dirs with comma if there are many staging dir paths.
multipart_size (int, optional) – The multipart upload part size of the given object storage. (default is 8MB).
multipart_thread_pool_size (int, optional) – The size of thread pool used for uploading multipart in parallel for the given object storage. (default is max(2, os.cpu_count()).
multipart_staging_buffer_size (int, optional) – The max byte size which will buffer the staging data in-memory before flushing to the staging file. It will decrease the random write in local staging disk dramatically if writing plenty of small files. default is 4096.
multipart_threshold (int, optional) – The threshold which control whether enable multipart upload during writing data to the given object storage, if the write data size is less than threshold, will write data via simple put instead of multipart upload. default is 10 MB.
enable_crc (bool) – Whether to enable client side CRC check after upload, default is true
enable_verify_ssl (bool) – Whether to verify the SSL certificate, default is true.
dns_cache_timeout (int) – The DNS cache timeout in minutes, if it is less than or equal to 0, it means to close the DNS cache, default is 0.
proxy_host (str, optional) – The host address of the proxy server, currently only supports the http protocol.
proxy_port (int, optional) – The port of the proxy server.
proxy_username (str, optional) – The username to use when connecting to the proxy server.
proxy_password (str, optional) – The password to use when connecting to the proxy server.
disable_encoding_meta (bool, optional) – Whether to encode user-defined metadata x-tos-meta- Content-Disposition, default encoding, no encoding when set to true.
except100_continue_threshold (int) – When it is greater than 0, it means that the interface related to the upload object opens the 100-continue mechanism for requests with the length of the data to be uploaded greater than the threshold (if the length of the data cannot be predicted, it is uniformly determined to be greater than the threshold), unit byte, default 65536
kwargs (Any, optional) – Additional arguments.
- cp_file(path1: str, path2: str, preserve_etag: bool | None = None, managed_copy_threshold: int | None = 5368709120, **kwargs: Any) None [source]
Copy file between locations on tos.
- Parameters:
path1 (str) – The source path of the file to copy.
path2 (str) – The destination path of the file to copy.
preserve_etag (bool, optional) – Whether to preserve etag while copying. If the file is uploaded as a single part, then it will be always equivalent to the md5 hash of the file hence etag will always be preserved. But if the file is uploaded in multi parts, then this option will try to reproduce the same multipart upload while copying and preserve the generated etag.
managed_copy_threshold (int, optional) – The threshold size of the file to copy using managed copy. If the size of the file is greater than this threshold, then the file will be copied using managed copy (default is 5 * 2**30).
**kwargs (Any, optional) – Additional arguments.
- Raises:
FileNotFoundError – If the source file does not exist.
ValueError – If the destination is a versioned file.
TosClientError – If there is a client error while copying the file.
TosServerError – If there is a server error while copying the file.
TosfsError – If there is an unknown error while copying the file.
- exists(path: str, **kwargs: Any) bool [source]
Check if a path exists in the TOS.
- Parameters:
path (str) – The path to check for existence.
**kwargs (Any, optional) – Additional arguments if needed in the future.
- Returns:
True if the path exists, False otherwise.
- Return type:
bool
- Raises:
tos.exceptions.TosClientError – If there is a client error while checking the path.
tos.exceptions.TosServerError – If there is a server error while checking the path.
TosfsError – If there is an unknown error while checking the path.
Examples
>>> fs = TosFileSystem() >>> fs.exists("tos://bucket/to/file") True >>> fs.exists("tos://mybucket/nonexistentfile") False
- expand_path(path: str | List[str], recursive: bool = False, maxdepth: int | None = None) List[str] [source]
Expand path to a list of files.
- Parameters:
path (str) – The path to expand.
recursive (bool, optional) – Whether to expand recursively (default is False).
maxdepth (int, optional) – The maximum depth to expand to (default is None).
**kwargs (Any, optional) – Additional arguments.
- Returns:
A list of expanded paths.
- Return type:
List[str]
- find(path: str, maxdepth: int | None = None, withdirs: bool = False, detail: bool = False, prefix: str = '', **kwargs: Any) List[str] | dict [source]
Find all files or dirs with conditions.
Like posix
find
command without conditions- Parameters:
path (str) – The path to search.
maxdepth (int, optional) – If not None, the maximum number of levels to descend
withdirs (bool) – Whether to include directory paths in the output. This is True when used by glob, but users usually only want files.
prefix (str) – Only return files that match
^{path}/{prefix}
(if there is an exact matchfilename == {path}/{prefix}
, it also will be included)detail (bool) – If True, return a dict with file information, else just the path
**kwargs (Any) – Additional arguments.
- get_file(rpath: str, lpath: str, **kwargs: Any) None [source]
Get a file from the TOS filesystem and write to a local path.
This method will retry the download if there is error.
- Parameters:
rpath (str) – The remote path of the file to get.
lpath (str) – The local path to save the file.
**kwargs (Any, optional) – Additional arguments.
- Raises:
FileNotFoundError – If the file does not exist.
tos.exceptions.TosClientError – If there is a client error while getting the file.
tos.exceptions.TosServerError – If there is a server error while getting the file.
TosfsError – If there is an unknown error while getting the file.
- glob(path: str, maxdepth: int | None = None, **kwargs: Any) Collection[Any] [source]
Return list of paths matching a glob-like pattern.
- Parameters:
path (str) – The path to search.
maxdepth (int, optional) – The maximum depth to search to (default is None).
**kwargs (Any, optional) – Additional arguments.
- info(path: str, bucket: str | None = None, key: str | None = None, version_id: str | None = None) dict [source]
Give details of entry at path.
Returns a single dictionary, with exactly the same information as
ls
would withdetail=True
.The default implementation should calls ls and could be overridden by a shortcut. kwargs are passed on to
`ls()
.Some file systems might not be able to measure the file’s size, in which case, the returned dict will include
'size': None
.- Returns:
dict with keys (name (full path in the FS), size (in bytes), type (file,)
directory, or something else) and other FS-specific keys.
- isdir(path: str) bool [source]
Check if the path is a directory.
- Parameters:
path (str) – The path to check.
- Returns:
True if the path is a directory, False otherwise.
- Return type:
bool
- Raises:
TosClientError – If there is a client error while accessing the path.
TosServerError – If there is a server error while accessing the path.
TosfsError – If there is an unknown error while accessing the path.
Examples
>>> fs = TosFileSystem() >>> fs.isdir("tos://mybucket/mydir/")
- isfile(path: str) bool [source]
Check if the path is a file.
- Parameters:
path (str) – The path to check.
- Returns:
True if the path is a file, False otherwise.
- Return type:
bool
- ls(path: str, detail: bool = False, versions: bool = False, **kwargs: str | bool | float | None) List[dict] | List[str] [source]
List objects under the given path.
- Parameters:
path (str) – The path to list.
detail (bool, optional) – Whether to return detailed information (default is False).
versions (bool, optional) – Whether to list object versions (default is False).
**kwargs (dict, optional) – Additional arguments.
- Returns:
A list of objects under the given path. If detail is True, returns a list of dictionaries with detailed information. Otherwise, returns a list of object names.
- Return type:
Union[List[dict], List[str]]
- Raises:
IOError – If there is an error accessing the parent directory.
Examples
>>> fs = TosFileSystem() >>> fs.ls("mybucket") ['mybucket/file1', 'mybucket/file2'] >>> fs.ls("mybucket", detail=True) [{'name': 'mybucket/file1', 'size': 123, 'type': 'file'}, {'name': 'mybucket/file2', 'size': 456, 'type': 'file'}]
- ls_iterate(path: str, detail: bool = False, versions: bool = False, batch_size: int = 1000, **kwargs: str | bool | float | None) Generator[dict | str, None, None] [source]
List objects under the given path in batches then returns an iterator.
- Parameters:
path (str) – The path to list.
detail (bool, optional) – Whether to return detailed information (default is False).
versions (bool, optional) – Whether to list object versions (default is False).
batch_size (int, optional) – The number of items to fetch in each batch (default is 1000).
**kwargs (dict, optional) – Additional arguments.
- Returns:
An iterator that yields objects under the given path.
- Return type:
Generator[Union[dict, str], None, None]
- Raises:
ValueError – If versions is specified but the filesystem is not version aware.
- makedirs(path: str, exist_ok: bool = False) None [source]
Recursively make directories.
Creates directory at path and any intervening required directories. Raises exception if, for instance, the path already exists but is a file.
- Parameters:
path (str) – leaf directory name
exist_ok (bool (False)) – If False, will error if the target already exists
- mkdir(path: str, create_parents: bool = True, **kwargs: Any) None [source]
Create directory entry at path.
For systems that don’t have true directories, may create an object for this instance only and not touch the real filesystem
- Parameters:
path (str) – location
create_parents (bool) – if True, this is equivalent to
makedirs
kwargs (Any) – may be permissions, etc.
- put_file(lpath: str, rpath: str, chunksize: int = 5242880, **kwargs: Any) None [source]
Put a file from local to TOS.
- Parameters:
lpath (str) – The local path of the file to put.
rpath (str) – The remote path of the file to receive.
chunksize (int, optional) – The size of the chunks to read from the file (default is 5 * 2**20).
**kwargs (Any, optional) – Additional arguments.
- Raises:
FileNotFoundError – If the local file does not exist.
TosClientError – If there is a client error while putting the file.
TosServerError – If there is a server error while putting the file.
TosfsError – If there is an unknown error while putting the file.
Examples
>>> fs = TosFileSystem() >>> fs.put_file("localfile.txt", "tos://mybucket/remote.txt")
- rm(path: str, recursive: bool = False, maxdepth: int | None = None) None [source]
Delete files.
- Parameters:
path (str or list of str) – File(s) to delete.
recursive (bool) – If file(s) are directories, recursively delete contents and then also remove the directory
maxdepth (int or None) – Depth to pass to walk for finding files to delete, if recursive. If None, there will be no limit and infinite recursion may be possible.
- rmdir(path: str) None [source]
Remove a directory if it is empty.
- Parameters:
path (str) – The path of the directory to remove. The path should be in the format tos://bucket/path/to/directory.
- Raises:
FileNotFoundError – If the directory does not exist.
NotADirectoryError – If the path is not a directory.
TosfsError – If the directory is not empty, or the path is a bucket.
Examples
>>> fs = TosFileSystem() >>> fs.rmdir("tos://mybucket/mydir/")
- touch(path: str, truncate: bool = True, **kwargs: Any) None [source]
Create an empty file at the given path.
- Parameters:
path (str) – The path of the file to create.
truncate (bool, optional) – Whether to truncate the file if it already exists (default is True).
**kwargs (Any, optional) – Additional arguments.
- Raises:
FileExistsError – If the file already exists and truncate is False.
TosfsError – If there is an unknown error while creating the file.
tos.exceptions.TosClientError – If there is a client error while creating the file.
tos.exceptions.TosServerError – If there is a server error while creating the file.
Examples
>>> fs = TosFileSystem() >>> fs.touch("tos://mybucket/myfile")
- walk(path: str, maxdepth: int | None = None, topdown: bool = True, on_error: str = 'omit', **kwargs: Any) Generator[str, List[str], List[str]] [source]
List objects under the given path.
- Parameters:
path (str) – The path to list.
maxdepth (int, optional) – The maximum depth to walk to (default is None).
topdown (bool, optional) – Whether to walk top-down or bottom-up (default is True).
on_error (str, optional) – How to handle errors (default is ‘omit’).
**kwargs (Any, optional) – Additional arguments.
- Raises:
ValueError – If the path is an invalid path.