API Reference¶
FilePath Objects¶
-
class
path2insight.
WindowsFilePath
(*args)¶ Object to analyse Windows file or folder path.
The WindowsFilePath inherits from
pathlib.PureWindowsPath
in Python >3.4. See the documentation for all properties and methods.Examples:
>>> p = WindowsFilePath("D://Documents/ProjectX/DEMO code.py") >>> str(p) 'D:\Documents\ProjectX\DEMO code.py' >>> p.lower_name().tokenize_stem() ['demo', 'code'] >>> p.extension '.py'
-
anchor
¶ The concatenation of the drive and root, or ‘’.
-
as_posix
()¶ Return the string representation of the path with forward (/) slashes.
-
as_uri
()¶ Return the path as a ‘file’ URI.
-
capitalize
(*args, **kwargs)¶ Apply string function ‘capitalize’ to filename parts.
-
capitalize_name
(*args, **kwargs)¶ Apply string function ‘capitalize’ to the name.
-
capitalize_stem
(*args, **kwargs)¶ Apply string function ‘capitalize’ to the stem.
-
casefold
(*args, **kwargs)¶ Apply string function ‘casefold’ to filename parts.
-
casefold_name
(*args, **kwargs)¶ Apply string function ‘casefold’ to the name.
-
casefold_stem
(*args, **kwargs)¶ Apply string function ‘casefold’ to the stem.
-
center
(*args, **kwargs)¶ Apply string function ‘center’ to filename parts.
-
center_name
(*args, **kwargs)¶ Apply string function ‘center’ to the name.
-
center_stem
(*args, **kwargs)¶ Apply string function ‘center’ to the stem.
-
count
(*args, **kwargs)¶ Apply string function ‘count’ to filename parts.
-
count_name
(*args, **kwargs)¶ Apply string function ‘count’ to the name.
-
count_stem
(*args, **kwargs)¶ Apply string function ‘count’ to the stem.
-
depth
¶ Compute the depth of the path.
Example: >>> WindowsFilePath('R:/Armel/path2insight/demo.py').depth 3
-
drive
¶ The drive prefix (letter or UNC path), if any.
-
encode
(*args, **kwargs)¶ Apply string function ‘encode’ to filename parts.
-
encode_name
(*args, **kwargs)¶ Apply string function ‘encode’ to the name.
-
encode_stem
(*args, **kwargs)¶ Apply string function ‘encode’ to the stem.
-
endswith
(*args, **kwargs)¶ Apply string function ‘endswith’ to filename parts.
-
endswith_name
(*args, **kwargs)¶ Apply string function ‘endswith’ to the name.
-
endswith_stem
(*args, **kwargs)¶ Apply string function ‘endswith’ to the stem.
-
expandtabs
(*args, **kwargs)¶ Apply string function ‘expandtabs’ to filename parts.
-
expandtabs_name
(*args, **kwargs)¶ Apply string function ‘expandtabs’ to the name.
-
expandtabs_stem
(*args, **kwargs)¶ Apply string function ‘expandtabs’ to the stem.
-
extension
¶ Masked property from self.suffix
-
extensions
¶ Masked property from self.suffixes
-
find
(*args, **kwargs)¶ Apply string function ‘find’ to filename parts.
-
find_name
(*args, **kwargs)¶ Apply string function ‘find’ to the name.
-
find_stem
(*args, **kwargs)¶ Apply string function ‘find’ to the stem.
-
format
(*args, **kwargs)¶ Apply string function ‘format’ to filename parts.
-
format_map
(*args, **kwargs)¶ Apply string function ‘format_map’ to filename parts.
-
format_map_name
(*args, **kwargs)¶ Apply string function ‘format_map’ to the name.
-
format_map_stem
(*args, **kwargs)¶ Apply string function ‘format_map’ to the stem.
-
format_name
(*args, **kwargs)¶ Apply string function ‘format’ to the name.
-
format_stem
(*args, **kwargs)¶ Apply string function ‘format’ to the stem.
-
index
(*args, **kwargs)¶ Apply string function ‘index’ to filename parts.
-
index_name
(*args, **kwargs)¶ Apply string function ‘index’ to the name.
-
index_stem
(*args, **kwargs)¶ Apply string function ‘index’ to the stem.
-
is_absolute
()¶ True if the path is absolute (has both a root and, if applicable, a drive).
-
is_reserved
()¶ Return True if the path contains one of the special names reserved by the system, if any.
-
isalnum
(*args, **kwargs)¶ Apply string function ‘isalnum’ to filename parts.
-
isalnum_name
(*args, **kwargs)¶ Apply string function ‘isalnum’ to the name.
-
isalnum_stem
(*args, **kwargs)¶ Apply string function ‘isalnum’ to the stem.
-
isalpha
(*args, **kwargs)¶ Apply string function ‘isalpha’ to filename parts.
-
isalpha_name
(*args, **kwargs)¶ Apply string function ‘isalpha’ to the name.
-
isalpha_stem
(*args, **kwargs)¶ Apply string function ‘isalpha’ to the stem.
-
isascii
(*args, **kwargs)¶ Apply string function ‘isascii’ to filename parts.
-
isascii_name
(*args, **kwargs)¶ Apply string function ‘isascii’ to the name.
-
isascii_stem
(*args, **kwargs)¶ Apply string function ‘isascii’ to the stem.
-
isdecimal
(*args, **kwargs)¶ Apply string function ‘isdecimal’ to filename parts.
-
isdecimal_name
(*args, **kwargs)¶ Apply string function ‘isdecimal’ to the name.
-
isdecimal_stem
(*args, **kwargs)¶ Apply string function ‘isdecimal’ to the stem.
-
isdigit
(*args, **kwargs)¶ Apply string function ‘isdigit’ to filename parts.
-
isdigit_name
(*args, **kwargs)¶ Apply string function ‘isdigit’ to the name.
-
isdigit_stem
(*args, **kwargs)¶ Apply string function ‘isdigit’ to the stem.
-
isidentifier
(*args, **kwargs)¶ Apply string function ‘isidentifier’ to filename parts.
-
isidentifier_name
(*args, **kwargs)¶ Apply string function ‘isidentifier’ to the name.
-
isidentifier_stem
(*args, **kwargs)¶ Apply string function ‘isidentifier’ to the stem.
-
islower
(*args, **kwargs)¶ Apply string function ‘islower’ to filename parts.
-
islower_name
(*args, **kwargs)¶ Apply string function ‘islower’ to the name.
-
islower_stem
(*args, **kwargs)¶ Apply string function ‘islower’ to the stem.
-
isnumeric
(*args, **kwargs)¶ Apply string function ‘isnumeric’ to filename parts.
-
isnumeric_name
(*args, **kwargs)¶ Apply string function ‘isnumeric’ to the name.
-
isnumeric_stem
(*args, **kwargs)¶ Apply string function ‘isnumeric’ to the stem.
-
isprintable
(*args, **kwargs)¶ Apply string function ‘isprintable’ to filename parts.
-
isprintable_name
(*args, **kwargs)¶ Apply string function ‘isprintable’ to the name.
-
isprintable_stem
(*args, **kwargs)¶ Apply string function ‘isprintable’ to the stem.
-
isspace
(*args, **kwargs)¶ Apply string function ‘isspace’ to filename parts.
-
isspace_name
(*args, **kwargs)¶ Apply string function ‘isspace’ to the name.
-
isspace_stem
(*args, **kwargs)¶ Apply string function ‘isspace’ to the stem.
-
istitle
(*args, **kwargs)¶ Apply string function ‘istitle’ to filename parts.
-
istitle_name
(*args, **kwargs)¶ Apply string function ‘istitle’ to the name.
-
istitle_stem
(*args, **kwargs)¶ Apply string function ‘istitle’ to the stem.
-
isupper
(*args, **kwargs)¶ Apply string function ‘isupper’ to filename parts.
-
isupper_name
(*args, **kwargs)¶ Apply string function ‘isupper’ to the name.
-
isupper_stem
(*args, **kwargs)¶ Apply string function ‘isupper’ to the stem.
-
join
(*args, **kwargs)¶ Apply string function ‘join’ to filename parts.
-
join_name
(*args, **kwargs)¶ Apply string function ‘join’ to the name.
-
join_stem
(*args, **kwargs)¶ Apply string function ‘join’ to the stem.
-
joinpath
(*args)¶ Combine this path with one or several arguments, and return a new path representing either a subpath (if all arguments are relative paths) or a totally different path (if one of the arguments is anchored).
-
ljust
(*args, **kwargs)¶ Apply string function ‘ljust’ to filename parts.
-
ljust_name
(*args, **kwargs)¶ Apply string function ‘ljust’ to the name.
-
ljust_stem
(*args, **kwargs)¶ Apply string function ‘ljust’ to the stem.
-
lower
(*args, **kwargs)¶ Apply string function ‘lower’ to filename parts.
-
lower_name
(*args, **kwargs)¶ Apply string function ‘lower’ to the name.
-
lower_stem
(*args, **kwargs)¶ Apply string function ‘lower’ to the stem.
-
lstrip
(*args, **kwargs)¶ Apply string function ‘lstrip’ to filename parts.
-
lstrip_name
(*args, **kwargs)¶ Apply string function ‘lstrip’ to the name.
-
lstrip_stem
(*args, **kwargs)¶ Apply string function ‘lstrip’ to the stem.
-
maketrans
(*args, **kwargs)¶ Apply string function ‘maketrans’ to filename parts.
-
maketrans_name
(*args, **kwargs)¶ Apply string function ‘maketrans’ to the name.
-
maketrans_stem
(*args, **kwargs)¶ Apply string function ‘maketrans’ to the stem.
-
match
(path_pattern)¶ Return True if this path matches the given pattern.
-
name
¶ The final path component, if any.
-
parent
¶ The logical parent of the path.
-
parents
¶ A sequence of this path’s logical parents.
-
partition
(*args, **kwargs)¶ Apply string function ‘partition’ to filename parts.
-
partition_name
(*args, **kwargs)¶ Apply string function ‘partition’ to the name.
-
partition_stem
(*args, **kwargs)¶ Apply string function ‘partition’ to the stem.
-
parts
¶ An object providing sequence-like access to the components in the filesystem path.
-
relative_to
(*other)¶ Return the relative path to another path identified by the passed arguments. If the operation is not possible (because this is not a subpath of the other path), raise ValueError.
-
replace
(*args, **kwargs)¶ Apply string function ‘replace’ to filename parts.
-
replace_name
(*args, **kwargs)¶ Apply string function ‘replace’ to the name.
-
replace_stem
(*args, **kwargs)¶ Apply string function ‘replace’ to the stem.
-
rfind
(*args, **kwargs)¶ Apply string function ‘rfind’ to filename parts.
-
rfind_name
(*args, **kwargs)¶ Apply string function ‘rfind’ to the name.
-
rfind_stem
(*args, **kwargs)¶ Apply string function ‘rfind’ to the stem.
-
rindex
(*args, **kwargs)¶ Apply string function ‘rindex’ to filename parts.
-
rindex_name
(*args, **kwargs)¶ Apply string function ‘rindex’ to the name.
-
rindex_stem
(*args, **kwargs)¶ Apply string function ‘rindex’ to the stem.
-
rjust
(*args, **kwargs)¶ Apply string function ‘rjust’ to filename parts.
-
rjust_name
(*args, **kwargs)¶ Apply string function ‘rjust’ to the name.
-
rjust_stem
(*args, **kwargs)¶ Apply string function ‘rjust’ to the stem.
-
root
¶ The root of the path, if any.
-
rpartition
(*args, **kwargs)¶ Apply string function ‘rpartition’ to filename parts.
-
rpartition_name
(*args, **kwargs)¶ Apply string function ‘rpartition’ to the name.
-
rpartition_stem
(*args, **kwargs)¶ Apply string function ‘rpartition’ to the stem.
-
rsplit
(*args, **kwargs)¶ Apply string function ‘rsplit’ to filename parts.
-
rsplit_name
(*args, **kwargs)¶ Apply string function ‘rsplit’ to the name.
-
rsplit_stem
(*args, **kwargs)¶ Apply string function ‘rsplit’ to the stem.
-
rstrip
(*args, **kwargs)¶ Apply string function ‘rstrip’ to filename parts.
-
rstrip_name
(*args, **kwargs)¶ Apply string function ‘rstrip’ to the name.
-
rstrip_stem
(*args, **kwargs)¶ Apply string function ‘rstrip’ to the stem.
-
split
(*args, **kwargs)¶ Apply string function ‘split’ to filename parts.
-
split_name
(*args, **kwargs)¶ Apply string function ‘split’ to the name.
-
split_stem
(*args, **kwargs)¶ Apply string function ‘split’ to the stem.
-
splitlines
(*args, **kwargs)¶ Apply string function ‘splitlines’ to filename parts.
-
splitlines_name
(*args, **kwargs)¶ Apply string function ‘splitlines’ to the name.
-
splitlines_stem
(*args, **kwargs)¶ Apply string function ‘splitlines’ to the stem.
-
startswith
(*args, **kwargs)¶ Apply string function ‘startswith’ to filename parts.
-
startswith_name
(*args, **kwargs)¶ Apply string function ‘startswith’ to the name.
-
startswith_stem
(*args, **kwargs)¶ Apply string function ‘startswith’ to the stem.
-
stem
¶ The final path component, minus its last suffix.
-
strip
(*args, **kwargs)¶ Apply string function ‘strip’ to filename parts.
-
strip_name
(*args, **kwargs)¶ Apply string function ‘strip’ to the name.
-
strip_stem
(*args, **kwargs)¶ Apply string function ‘strip’ to the stem.
-
suffix
¶ The final component’s last suffix, if any.
-
suffixes
¶ A list of the final component’s suffixes, if any.
-
swapcase
(*args, **kwargs)¶ Apply string function ‘swapcase’ to filename parts.
-
swapcase_name
(*args, **kwargs)¶ Apply string function ‘swapcase’ to the name.
-
swapcase_stem
(*args, **kwargs)¶ Apply string function ‘swapcase’ to the stem.
-
title
(*args, **kwargs)¶ Apply string function ‘title’ to filename parts.
-
title_name
(*args, **kwargs)¶ Apply string function ‘title’ to the name.
-
title_stem
(*args, **kwargs)¶ Apply string function ‘title’ to the stem.
-
tokenize
(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)', exclude_extension=True)¶ Tokenise the name (without extension)
-
tokenize_name
(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)')¶ Tokenise the name
-
tokenize_stem
(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)')¶ Tokenise the name
-
translate
(*args, **kwargs)¶ Apply string function ‘translate’ to filename parts.
-
translate_name
(*args, **kwargs)¶ Apply string function ‘translate’ to the name.
-
translate_stem
(*args, **kwargs)¶ Apply string function ‘translate’ to the stem.
-
upper
(*args, **kwargs)¶ Apply string function ‘upper’ to filename parts.
-
upper_name
(*args, **kwargs)¶ Apply string function ‘upper’ to the name.
-
upper_stem
(*args, **kwargs)¶ Apply string function ‘upper’ to the stem.
-
with_name
(name)¶ Return a new path with the file name changed.
-
with_suffix
(suffix)¶ Return a new path with the file suffix changed. If the path has no suffix, add given suffix. If the given suffix is an empty string, remove the suffix from the path.
-
zfill
(*args, **kwargs)¶ Apply string function ‘zfill’ to filename parts.
-
zfill_name
(*args, **kwargs)¶ Apply string function ‘zfill’ to the name.
-
zfill_stem
(*args, **kwargs)¶ Apply string function ‘zfill’ to the stem.
-
-
class
path2insight.
PosixFilePath
(*args)¶ Object to analyse Posix file or folder path.
The WindowsFilePath inherits from pathlib.PureWindowsPath in Python >3.4. See https://docs.python.org/3/library/pathlib.html#methods-and-properties for all properties and methods.
-
anchor
¶ The concatenation of the drive and root, or ‘’.
-
as_posix
()¶ Return the string representation of the path with forward (/) slashes.
-
as_uri
()¶ Return the path as a ‘file’ URI.
-
capitalize
(*args, **kwargs)¶ Apply string function ‘capitalize’ to filename parts.
-
capitalize_name
(*args, **kwargs)¶ Apply string function ‘capitalize’ to the name.
-
capitalize_stem
(*args, **kwargs)¶ Apply string function ‘capitalize’ to the stem.
-
casefold
(*args, **kwargs)¶ Apply string function ‘casefold’ to filename parts.
-
casefold_name
(*args, **kwargs)¶ Apply string function ‘casefold’ to the name.
-
casefold_stem
(*args, **kwargs)¶ Apply string function ‘casefold’ to the stem.
-
center
(*args, **kwargs)¶ Apply string function ‘center’ to filename parts.
-
center_name
(*args, **kwargs)¶ Apply string function ‘center’ to the name.
-
center_stem
(*args, **kwargs)¶ Apply string function ‘center’ to the stem.
-
count
(*args, **kwargs)¶ Apply string function ‘count’ to filename parts.
-
count_name
(*args, **kwargs)¶ Apply string function ‘count’ to the name.
-
count_stem
(*args, **kwargs)¶ Apply string function ‘count’ to the stem.
-
depth
¶ Compute the depth of the path.
Example: >>> WindowsFilePath('R:/Armel/path2insight/demo.py').depth 3
-
drive
¶ The drive prefix (letter or UNC path), if any.
-
encode
(*args, **kwargs)¶ Apply string function ‘encode’ to filename parts.
-
encode_name
(*args, **kwargs)¶ Apply string function ‘encode’ to the name.
-
encode_stem
(*args, **kwargs)¶ Apply string function ‘encode’ to the stem.
-
endswith
(*args, **kwargs)¶ Apply string function ‘endswith’ to filename parts.
-
endswith_name
(*args, **kwargs)¶ Apply string function ‘endswith’ to the name.
-
endswith_stem
(*args, **kwargs)¶ Apply string function ‘endswith’ to the stem.
-
expandtabs
(*args, **kwargs)¶ Apply string function ‘expandtabs’ to filename parts.
-
expandtabs_name
(*args, **kwargs)¶ Apply string function ‘expandtabs’ to the name.
-
expandtabs_stem
(*args, **kwargs)¶ Apply string function ‘expandtabs’ to the stem.
-
extension
¶ Masked property from self.suffix
-
extensions
¶ Masked property from self.suffixes
-
find
(*args, **kwargs)¶ Apply string function ‘find’ to filename parts.
-
find_name
(*args, **kwargs)¶ Apply string function ‘find’ to the name.
-
find_stem
(*args, **kwargs)¶ Apply string function ‘find’ to the stem.
-
format
(*args, **kwargs)¶ Apply string function ‘format’ to filename parts.
-
format_map
(*args, **kwargs)¶ Apply string function ‘format_map’ to filename parts.
-
format_map_name
(*args, **kwargs)¶ Apply string function ‘format_map’ to the name.
-
format_map_stem
(*args, **kwargs)¶ Apply string function ‘format_map’ to the stem.
-
format_name
(*args, **kwargs)¶ Apply string function ‘format’ to the name.
-
format_stem
(*args, **kwargs)¶ Apply string function ‘format’ to the stem.
-
index
(*args, **kwargs)¶ Apply string function ‘index’ to filename parts.
-
index_name
(*args, **kwargs)¶ Apply string function ‘index’ to the name.
-
index_stem
(*args, **kwargs)¶ Apply string function ‘index’ to the stem.
-
is_absolute
()¶ True if the path is absolute (has both a root and, if applicable, a drive).
-
is_reserved
()¶ Return True if the path contains one of the special names reserved by the system, if any.
-
isalnum
(*args, **kwargs)¶ Apply string function ‘isalnum’ to filename parts.
-
isalnum_name
(*args, **kwargs)¶ Apply string function ‘isalnum’ to the name.
-
isalnum_stem
(*args, **kwargs)¶ Apply string function ‘isalnum’ to the stem.
-
isalpha
(*args, **kwargs)¶ Apply string function ‘isalpha’ to filename parts.
-
isalpha_name
(*args, **kwargs)¶ Apply string function ‘isalpha’ to the name.
-
isalpha_stem
(*args, **kwargs)¶ Apply string function ‘isalpha’ to the stem.
-
isascii
(*args, **kwargs)¶ Apply string function ‘isascii’ to filename parts.
-
isascii_name
(*args, **kwargs)¶ Apply string function ‘isascii’ to the name.
-
isascii_stem
(*args, **kwargs)¶ Apply string function ‘isascii’ to the stem.
-
isdecimal
(*args, **kwargs)¶ Apply string function ‘isdecimal’ to filename parts.
-
isdecimal_name
(*args, **kwargs)¶ Apply string function ‘isdecimal’ to the name.
-
isdecimal_stem
(*args, **kwargs)¶ Apply string function ‘isdecimal’ to the stem.
-
isdigit
(*args, **kwargs)¶ Apply string function ‘isdigit’ to filename parts.
-
isdigit_name
(*args, **kwargs)¶ Apply string function ‘isdigit’ to the name.
-
isdigit_stem
(*args, **kwargs)¶ Apply string function ‘isdigit’ to the stem.
-
isidentifier
(*args, **kwargs)¶ Apply string function ‘isidentifier’ to filename parts.
-
isidentifier_name
(*args, **kwargs)¶ Apply string function ‘isidentifier’ to the name.
-
isidentifier_stem
(*args, **kwargs)¶ Apply string function ‘isidentifier’ to the stem.
-
islower
(*args, **kwargs)¶ Apply string function ‘islower’ to filename parts.
-
islower_name
(*args, **kwargs)¶ Apply string function ‘islower’ to the name.
-
islower_stem
(*args, **kwargs)¶ Apply string function ‘islower’ to the stem.
-
isnumeric
(*args, **kwargs)¶ Apply string function ‘isnumeric’ to filename parts.
-
isnumeric_name
(*args, **kwargs)¶ Apply string function ‘isnumeric’ to the name.
-
isnumeric_stem
(*args, **kwargs)¶ Apply string function ‘isnumeric’ to the stem.
-
isprintable
(*args, **kwargs)¶ Apply string function ‘isprintable’ to filename parts.
-
isprintable_name
(*args, **kwargs)¶ Apply string function ‘isprintable’ to the name.
-
isprintable_stem
(*args, **kwargs)¶ Apply string function ‘isprintable’ to the stem.
-
isspace
(*args, **kwargs)¶ Apply string function ‘isspace’ to filename parts.
-
isspace_name
(*args, **kwargs)¶ Apply string function ‘isspace’ to the name.
-
isspace_stem
(*args, **kwargs)¶ Apply string function ‘isspace’ to the stem.
-
istitle
(*args, **kwargs)¶ Apply string function ‘istitle’ to filename parts.
-
istitle_name
(*args, **kwargs)¶ Apply string function ‘istitle’ to the name.
-
istitle_stem
(*args, **kwargs)¶ Apply string function ‘istitle’ to the stem.
-
isupper
(*args, **kwargs)¶ Apply string function ‘isupper’ to filename parts.
-
isupper_name
(*args, **kwargs)¶ Apply string function ‘isupper’ to the name.
-
isupper_stem
(*args, **kwargs)¶ Apply string function ‘isupper’ to the stem.
-
join
(*args, **kwargs)¶ Apply string function ‘join’ to filename parts.
-
join_name
(*args, **kwargs)¶ Apply string function ‘join’ to the name.
-
join_stem
(*args, **kwargs)¶ Apply string function ‘join’ to the stem.
-
joinpath
(*args)¶ Combine this path with one or several arguments, and return a new path representing either a subpath (if all arguments are relative paths) or a totally different path (if one of the arguments is anchored).
-
ljust
(*args, **kwargs)¶ Apply string function ‘ljust’ to filename parts.
-
ljust_name
(*args, **kwargs)¶ Apply string function ‘ljust’ to the name.
-
ljust_stem
(*args, **kwargs)¶ Apply string function ‘ljust’ to the stem.
-
lower
(*args, **kwargs)¶ Apply string function ‘lower’ to filename parts.
-
lower_name
(*args, **kwargs)¶ Apply string function ‘lower’ to the name.
-
lower_stem
(*args, **kwargs)¶ Apply string function ‘lower’ to the stem.
-
lstrip
(*args, **kwargs)¶ Apply string function ‘lstrip’ to filename parts.
-
lstrip_name
(*args, **kwargs)¶ Apply string function ‘lstrip’ to the name.
-
lstrip_stem
(*args, **kwargs)¶ Apply string function ‘lstrip’ to the stem.
-
maketrans
(*args, **kwargs)¶ Apply string function ‘maketrans’ to filename parts.
-
maketrans_name
(*args, **kwargs)¶ Apply string function ‘maketrans’ to the name.
-
maketrans_stem
(*args, **kwargs)¶ Apply string function ‘maketrans’ to the stem.
-
match
(path_pattern)¶ Return True if this path matches the given pattern.
-
name
¶ The final path component, if any.
-
parent
¶ The logical parent of the path.
-
parents
¶ A sequence of this path’s logical parents.
-
partition
(*args, **kwargs)¶ Apply string function ‘partition’ to filename parts.
-
partition_name
(*args, **kwargs)¶ Apply string function ‘partition’ to the name.
-
partition_stem
(*args, **kwargs)¶ Apply string function ‘partition’ to the stem.
-
parts
¶ An object providing sequence-like access to the components in the filesystem path.
-
relative_to
(*other)¶ Return the relative path to another path identified by the passed arguments. If the operation is not possible (because this is not a subpath of the other path), raise ValueError.
-
replace
(*args, **kwargs)¶ Apply string function ‘replace’ to filename parts.
-
replace_name
(*args, **kwargs)¶ Apply string function ‘replace’ to the name.
-
replace_stem
(*args, **kwargs)¶ Apply string function ‘replace’ to the stem.
-
rfind
(*args, **kwargs)¶ Apply string function ‘rfind’ to filename parts.
-
rfind_name
(*args, **kwargs)¶ Apply string function ‘rfind’ to the name.
-
rfind_stem
(*args, **kwargs)¶ Apply string function ‘rfind’ to the stem.
-
rindex
(*args, **kwargs)¶ Apply string function ‘rindex’ to filename parts.
-
rindex_name
(*args, **kwargs)¶ Apply string function ‘rindex’ to the name.
-
rindex_stem
(*args, **kwargs)¶ Apply string function ‘rindex’ to the stem.
-
rjust
(*args, **kwargs)¶ Apply string function ‘rjust’ to filename parts.
-
rjust_name
(*args, **kwargs)¶ Apply string function ‘rjust’ to the name.
-
rjust_stem
(*args, **kwargs)¶ Apply string function ‘rjust’ to the stem.
-
root
¶ The root of the path, if any.
-
rpartition
(*args, **kwargs)¶ Apply string function ‘rpartition’ to filename parts.
-
rpartition_name
(*args, **kwargs)¶ Apply string function ‘rpartition’ to the name.
-
rpartition_stem
(*args, **kwargs)¶ Apply string function ‘rpartition’ to the stem.
-
rsplit
(*args, **kwargs)¶ Apply string function ‘rsplit’ to filename parts.
-
rsplit_name
(*args, **kwargs)¶ Apply string function ‘rsplit’ to the name.
-
rsplit_stem
(*args, **kwargs)¶ Apply string function ‘rsplit’ to the stem.
-
rstrip
(*args, **kwargs)¶ Apply string function ‘rstrip’ to filename parts.
-
rstrip_name
(*args, **kwargs)¶ Apply string function ‘rstrip’ to the name.
-
rstrip_stem
(*args, **kwargs)¶ Apply string function ‘rstrip’ to the stem.
-
split
(*args, **kwargs)¶ Apply string function ‘split’ to filename parts.
-
split_name
(*args, **kwargs)¶ Apply string function ‘split’ to the name.
-
split_stem
(*args, **kwargs)¶ Apply string function ‘split’ to the stem.
-
splitlines
(*args, **kwargs)¶ Apply string function ‘splitlines’ to filename parts.
-
splitlines_name
(*args, **kwargs)¶ Apply string function ‘splitlines’ to the name.
-
splitlines_stem
(*args, **kwargs)¶ Apply string function ‘splitlines’ to the stem.
-
startswith
(*args, **kwargs)¶ Apply string function ‘startswith’ to filename parts.
-
startswith_name
(*args, **kwargs)¶ Apply string function ‘startswith’ to the name.
-
startswith_stem
(*args, **kwargs)¶ Apply string function ‘startswith’ to the stem.
-
stem
¶ The final path component, minus its last suffix.
-
strip
(*args, **kwargs)¶ Apply string function ‘strip’ to filename parts.
-
strip_name
(*args, **kwargs)¶ Apply string function ‘strip’ to the name.
-
strip_stem
(*args, **kwargs)¶ Apply string function ‘strip’ to the stem.
-
suffix
¶ The final component’s last suffix, if any.
-
suffixes
¶ A list of the final component’s suffixes, if any.
-
swapcase
(*args, **kwargs)¶ Apply string function ‘swapcase’ to filename parts.
-
swapcase_name
(*args, **kwargs)¶ Apply string function ‘swapcase’ to the name.
-
swapcase_stem
(*args, **kwargs)¶ Apply string function ‘swapcase’ to the stem.
-
title
(*args, **kwargs)¶ Apply string function ‘title’ to filename parts.
-
title_name
(*args, **kwargs)¶ Apply string function ‘title’ to the name.
-
title_stem
(*args, **kwargs)¶ Apply string function ‘title’ to the stem.
-
tokenize
(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)', exclude_extension=True)¶ Tokenise the name (without extension)
-
tokenize_name
(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)')¶ Tokenise the name
-
tokenize_stem
(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)')¶ Tokenise the name
-
translate
(*args, **kwargs)¶ Apply string function ‘translate’ to filename parts.
-
translate_name
(*args, **kwargs)¶ Apply string function ‘translate’ to the name.
-
translate_stem
(*args, **kwargs)¶ Apply string function ‘translate’ to the stem.
-
upper
(*args, **kwargs)¶ Apply string function ‘upper’ to filename parts.
-
upper_name
(*args, **kwargs)¶ Apply string function ‘upper’ to the name.
-
upper_stem
(*args, **kwargs)¶ Apply string function ‘upper’ to the stem.
-
with_name
(name)¶ Return a new path with the file name changed.
-
with_suffix
(suffix)¶ Return a new path with the file suffix changed. If the path has no suffix, add given suffix. If the given suffix is an empty string, remove the suffix from the path.
-
zfill
(*args, **kwargs)¶ Apply string function ‘zfill’ to filename parts.
-
zfill_name
(*args, **kwargs)¶ Apply string function ‘zfill’ to the name.
-
zfill_stem
(*args, **kwargs)¶ Apply string function ‘zfill’ to the stem.
-
Parsing¶
-
path2insight.
parse
(obj, os_name=None)¶ Parse (list of) file paths.
Parse a list with file paths into list of WindowsFilePath and PosixFilePath objects. This function can parse list, tuple, numpy.ndarray and pandas.Series. This is done with one of the following parsers: parse_from_pandas, parse_from_numpy or parse_from_list.
Example: >>> data = ['file1.xml', 'data/file1.txt', 'data/file2.txt'] >>> path2insight.parse(data, os_name='windows')
gives the same result as
>>> import pandas >>> path2insight.parse(pandas.Series(data), os_name='windows')
Parameters: Returns: Returns a list with WindowsFilePaths and PosixFilePaths.
Return_type: list
-
path2insight.
parse_from_list
(l, os_name=None)¶ Parse a list with file paths.
See
path2insight.parse()
for additional information.Parameters: Returns: Returns a list with WindowsFilePaths and PosixFilePaths.
Return_type: list
-
path2insight.
parse_from_numpy
(np_object, os_name=None)¶ Parse a numpy array with file paths.
See
path2insight.parse()
for additional information.Parameters: - obj (numpy.ndarray) – A numpy.ndarray with filepaths in the form of strings.
- os_name (str) – The operation system on with the filepaths are collected. The options are ‘windows’ or ‘posix’ for Windows and Posix system repectivily.
Returns: Returns a list with WindowsFilePaths and PosixFilePaths.
Return_type: list
-
path2insight.
parse_from_pandas
(pandas_object, os_name=None)¶ Parse a series or dataframe with file paths.
See
path2insight.parse()
for additional information.Parameters: - obj (pandas.Series or pandas.DataFrame) – An pandas.Series or pandas.DataFrame with filepaths in the form of strings.
- os_name (str) – The operation system on with the filepaths are collected. The options are ‘windows’ or ‘posix’ for Windows and Posix system repectivily.
Returns: Returns a list with WindowsFilePaths and PosixFilePaths.
Return_type: list
Handling¶
-
path2insight.
sort
(paths, level=None, reverse=False)¶ Sort a list of filepaths.
This function sorts a list of filepaths. The sorting can be based on parts of the (like folder of file) names. This is done with the key arguments.
Parameters: Returns: A sorted list.
Return_type: list
Example: >>> path2insight.sort(data) >>> path2insight.sort(data, key=1) >>> path2insight.sort(data, key=1, reverse=True) >>> path2insight.sort(data, key=[5, 4])
-
path2insight.
sample
(data, n=None)¶ Take a random sample of filepaths.
Parameters: Returns: A list with a sample of filepath.
Return_type: list
-
path2insight.
select
(paths, **kwargs)¶ Select from a list of filepaths.
This function selects from a list of filepaths based on their part (like folder of file) names. This is done with the level arguments.
Parameters: Returns: A list with the selection of matching filepath.
Return_type: list
Note: One can use the value “*” to select all file paths. If a file path doesn’t have a value on that level (because the level is higher than the number of parts), then the path is excluded from the selection. One can also use True instead of “*”.
Example: Selection based on the name of a level.
>>> import path2insight >>> data = [path2insight.WindowsFilePath("F:/data/file.txt"), path2insight.WindowsFilePath("F:/docs/file.xlsx"), path2insight.WindowsFilePath("F:/test/file.demo"), path2insight.WindowsFilePath("F:/README.txt")] >>> path2insight.select(data, level1='data') [path2insight.WindowsFilePath("F:/data/file.txt")]
Selection based on the existence of a level (wilcard). Path is only included when level exists.
>>> path2insight.select(data, level2='*') [path2insight.WindowsFilePath("F:/data/file.txt"), path2insight.WindowsFilePath("F:/docs/file.xlsx"), path2insight.WindowsFilePath("F:/test/file.demo")]
-
path2insight.
select_re
(paths, **kwargs)¶ Select from a list of filepaths.
This function selects from a list of filepaths based on their part (like folder of file) names. This is done with the level arguments.
Parameters: Returns: A list with the selection of matching filepath.
Return_type: list
Example: Selection based on the name of a level.
>>> import path2insight >>> data = [path2insight.WindowsFilePath("F:/data/file.txt"), path2insight.WindowsFilePath("F:/docs/file.xlsx"), path2insight.WindowsFilePath("F:/test/file.demo"), path2insight.WindowsFilePath("F:/README.txt")] >>> path2insight.select(data, level1=r"[A-Z]") [path2insight.WindowsFilePath("F:/README.txt")]
Select all paths starting with the letter d on the first level.
>>> path2insight.select(data, level1=r"^d") [path2insight.WindowsFilePath("F:/data/file.txt"), path2insight.WindowsFilePath("F:/docs/file.xlsx")]
Explore¶
-
path2insight.explore.stats.
depth_counts
(x, normalize=False, center=None)¶ Count the filepath-depths.
This function counts the filepath depths of a list of filepaths. The function returns a Python
collections.Counter
object. This Counter object can be used to compute the most common depths or substract other Counter objects. For all options, see the Python documentation.Parameters: Returns: filepath depths counted
Return type: Example: >>> path2insight.depth_counts(list_of_filepaths) Counter({5: 32, 6: 654, 7: 284, 8: 13, 9: 11, 10: 1, 11: 4, 13: 1})
Note: To get a Python
dict
, simply wrap the Counter object withdict()
.
-
path2insight.explore.stats.
drive_counts
(x, lower=False, normalize=False)¶ Count the drives of the paths.
This function counts the drives of a list of filepaths. The function returns a Python
collections.Counter
object. This Counter object can be used to compute the most common drives or substract other Counter objects. For all options, see the Python documentation.Parameters: Returns: drives counted
Return type: Note: To get a Python
dict
, simply wrap the Counter object withdict()
.
-
path2insight.explore.stats.
extension_chisquare
(x, y=None, lower=True)¶ Calculates a one-way chi square test for file extensions.
Parameters: Returns: The test result.
Return type: scipy.stats.Power_divergenceResult
-
path2insight.explore.stats.
extension_counts
(x, lower=False, normalize=False)¶ Count the extensions of the filenames.
This function counts the name extensions of a list of filepaths. The function returns a Python
collections.Counter
object. This Counter object can be used to compute the most common extensions or substract other Counter objects. For all options, see the Python documentation.Parameters: Returns: extensions counted
Return type: Example: >>> path2insight.extension_counts(filepaths_list) Counter({'.zip': 42, '.raw': 3, '.txt': 12, '.docx': 1}) >>> path2insight.extension_counts(filepaths_list).most_common(3) [('.zip', 42), ('.txt', 12), ('.raw', 3)]
Note: To get a Python
dict
, simply wrap the Counter object withdict()
.
-
path2insight.explore.stats.
n_extension_counts
(x)¶ [CHANGE FUNCTION NAME]Count the number of extensions.
-
path2insight.explore.stats.
name_chisquare
(x, y=None, lower=True)¶ Calculates a one-way chi square test for file names.
Parameters: Returns: The test result.
Return type: scipy.stats.Power_divergenceResult
-
path2insight.explore.stats.
name_counts
(x, lower=False, normalize=False)¶ Count the names.
This function counts the names of a list of filepaths. The function returns a Python
collections.Counter
object. This Counter object can be used to compute the most common names or substract other Counter objects. For all options, see the Python documentation.Parameters: Returns: names counted
Return type: Note: To get a Python
dict
, simply wrap the Counter object withdict()
.
-
path2insight.explore.stats.
stem_chisquare
(x, y=None, lower=True)¶ Calculates a one-way chi square test for file name stems.
Parameters: Returns: The test result.
Return type: scipy.stats.Power_divergenceResult
-
path2insight.explore.stats.
stem_counts
(x, lower=False, normalize=False)¶ Count the stems.
This function counts the stems of a list of filepaths. The function returns a Python
collections.Counter
object. This Counter object can be used to compute the most common stems or substract other Counter objects. For all options, see the Python documentation.Parameters: Returns: stems counted
Return type: Note: To get a Python
dict
, simply wrap the Counter object withdict()
.
-
path2insight.explore.stats.
token_counts
(x, tokenizer=<function default_tokenizer>, lower=False, parents=False, stem=True, extension=False, normalize=False)¶ Count the tokens in the paths.
This function counts the tokens of a list of filepaths. Use boolean settings to include the parents, stem and extension. The function returns a Python
collections.Counter
object. This Counter object can be used to compute the most common tokens or substract other Counter objects. For all options, see the Python documentation.Parameters: - x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to count the stems of.
- parents (bool) – tokenize the parents
- stem (bool) – tokenize the stem
- extension (bool) – tokenisze the extension
- lower (boolean) – Convert the filepath to lower before counting.
- normalize (bool) – Normalize the Counter result. Default False.
Returns: drives counted
Return type: Example: >>> path2insight.token_counts(data).most_common(3) [('FUNC001', 288), ('LTQ', 173), ('FUNCTNS', 96)]
Note: To get a Python
dict
, simply wrap the Counter object withdict()
.
-
path2insight.explore.metrics.
distance_on_depth
(x, y=None, metric='l2', n_jobs=1)¶ Compute the distance between filenames based on the depth.
The distance between filenames is computed based on the difference in depth between the filenames.
Parameters: - x (list) – A list of filepath objects
- y (list) – A list of filepath pbjects to compare x with. If y is None, the internal similarity of the filepaths in x are computed.
- metric (string, or callable) – The distance metric like ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html for all possible metrics. Default ‘l2’.
- n_jobs (int) – The number of cores to use during the computation of the metric. Default 1.
Example: >>> from path2insight.explore import distance_on_depth >>> import seaborn as sns
>>> d = distance_on_depth(DATASET) >>> sns.heatmap(d)
Note: For visual inspection, the heatmap function in seaborn can be useful.
-
path2insight.explore.metrics.
distance_on_extension
(x, y=None, tokenizer=None, metric='l2', n_jobs=1)¶ Compute the distance between filenames based on the extension.
The distance between filenames is computed based on the number of extensions that both filenames have in common.
Parameters: - x (list) – A list of filepath objects
- y (list) – A list of filepath pbjects to compare x with. If y is None, the internal similarity of the filepaths in x are computed.
- metric (string, or callable) – The distance metric like ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html for all possible metrics. Default ‘l2’.
- n_jobs (int) – The number of cores to use during the computation of the metric. Default 1.
Example: >>> from path2insight.explore import distance_on_extension >>> import seaborn as sns
>>> d = distance_on_extension(DATASET) >>> sns.heatmap(d)
Note: For visual inspection, the heatmap function in seaborn can be useful.
-
path2insight.explore.metrics.
distance_on_token
(x, y=None, tokenizer=None, metric='l2', n_jobs=1)¶ Compute the distance between filenames based on tokens.
The distance between filenames is computed based on the number of tokens that both filenames have in common.
Parameters: - x (list) – A list of filepath objects
- y (list) – A list of filepath pbjects to compare x with. If y is None, the internal similarity of the filepaths in x are computed.
- tokenizer (callable) – Not implemented yet.
- metric (string, or callable) – The distance metric like ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html for all possible metrics. Default ‘l2’.
- n_jobs (int) – The number of cores to use during the computation of the metric. Default 1.
Example: >>> from path2insight.explore import distance_on_token >>> import seaborn as sns
>>> d = distance_on_token(DATASET) >>> sns.heatmap(d)
Note: For visual inspection, the heatmap function in seaborn can be useful.
Tokenizing¶
-
path2insight.tokenizers.tokenizers.
camel_splitter
(x)¶ Make tokens from camelCase strings
Parameters: x (WindowsFilePath, PosixFilePath, str) – The filepath of string. Returns: list of tokens (strings) Return_type: list
-
path2insight.tokenizers.tokenizers.
default_tokenizer
(x)¶ Make tokens of a file path or string.
Parameters: x (WindowsFilePath, PosixFilePath, str) – The filepath of string. Returns: list of tokens (strings) Return_type: list
-
path2insight.tokenizers.tokenizers.
path_tokenizer
(x)¶ Make parts of a file path.
Parameters: x (WindowsFilePath, PosixFilePath, str) – The filepath . Returns: list of path parts (strings) Return_type: list
-
path2insight.tokenizers.tokenizers.
title_splitter
(x)¶ Make tokens from title formatted strings
Parameters: x (WindowsFilePath, PosixFilePath, str) – The filepath of string. Returns: list of tokens (strings) Return_type: list
Tagging¶
This module implements a filepath tagger. The object structure of this filepath tagger is based on the tagger objects in the Natural Language Toolkit (NLTK).
-
class
path2insight.explore.tagger.
BaseTypeTagger
(tokenizer=None, tag_names=None)¶ The base class for the type taggers.
Parameters:
-
class
path2insight.explore.tagger.
CompressionTagger
(tags=..., na_tag='', ignore_case=True, use_wildcards=True)¶ Extension tagger for compression and archiving.
This tagger tags compressed file paths based on their extension. There are three different types of tags in this tagger. The tags are:
- ARCHIVE
- COMPRESSION
- ARCHIVE_AND_COMPRESSION
Parameters: Ignore_case: bool Case-insensitive extension tagging. Default False.
Use_wildcards: bool Use Unix shell-style wildcards like * and ?. Default True.
-
class
path2insight.explore.tagger.
DocumentTagger
(tags=..., na_tag='', ignore_case=True, use_wildcards=True)¶ Extension tagger for documents.
This tagger tags compressed file paths based on their extension. There are three different types of tags in this tagger. The tags are:
- DOCUMENT
- PRESENTATION
Parameters: Ignore_case: bool Case-insensitive extension tagging. Default False.
Use_wildcards: bool Use Unix shell-style wildcards like * and ?. Default True.
-
class
path2insight.explore.tagger.
ExtensionTagger
(tags={}, na_tag='', ignore_case=False, use_wildcards=True)¶ Extension tagger based on dict of tags.
Unix shell-style wildcards like * and ? are supported.
Parameters: Ignore_case: bool Case-insensitive extension tagging. Default False.
Use_wildcards: bool Use Unix shell-style wildcards like * and ?. Default True.
Note:
Use an OrderedDict in case of order prevelence.
-
class
path2insight.explore.tagger.
FolderTagger
¶ [EXPERIMENTAL] A tagger that assigns a FOLDER or FILE tag to each path.
>>> from path2insight.explore import FolderTagger >>> folder_tagger = FolderTagger() >>> list(folder_tagger.tag([WindowsFilePath('D:/armel/file.xyz')]) [(WindowsFilePath('D:/armel/file.xyz'), 'FILE')]
-
class
path2insight.explore.tagger.
Tagger
¶ Base class for the taggers.
-
class
path2insight.explore.tagger.
TokenTypeTagger
(tokenizer=<function default_tokenizer>, tag_names=None)¶ A tagger that tags each filepath part (and extension) with the following labels: drive (DRV), folder (FLD), stem (STEM) and extension (EXT).
Parameters: - tokenizer (callable) – A function that converts a filepath or string into tokens.
- tag_names (list) – The names of the four tags that this tagger uses. The tags default tags are drive=”DRV”, folder=”FLD”, stem=”STM” and extension=”EXT”.
-
class
path2insight.explore.tagger.
TypeTagger
(tag_names=None)¶ A tagger that tags each filepath part (and extension) with the following labels: drive (DRV), folder (FLD), stem (STEM) and extension (EXT).
Parameters: tag_names (list) – The names of the four tags that this tagger uses. The tags default tags are drive=”DRV”, folder=”FLD”, stem=”STM” and extension=”EXT”.
Datasets¶
Create¶
-
path2insight.collect.
walk
(d, delay=None, **kwargs)¶ Walk the file system like os.walk.
Function to collect file paths from the file system. This function is useful for collecting and sharing the file paths. The function is similar to os.walk.
Parameters: Returns: Return the file paths and folder paths. The function returns a tuples with (files, folders)
Return_type: (list, list)
Example: Collect and share filepaths with pandas.
>>> import pandas as pd >>> import path2insight >>> files, folders = path2insight.walk('.') >>> pd.DataFrame(files).to_csv("export_filepaths.csv", index=False)
Examples¶
Path2Insight comes with several datasets. These datasets are public and real datasets. The datasets are available through the submodule ‘datasets’. See the example below:
from path2insight.datasets import load_pride
-
path2insight.datasets.external.
load_ensembl
(nrows=None, skiprows=None)¶ Load the filepaths of the Ensembl dataset (release 90).
“Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species. (ensembl.org)”
The filepaths from release-90 of this dataset are loaded with this function. The data can be found at ftp://ftp.ensembl.org/pub/release-90/. The snapshot was taken on 16 November 2017 with a Linux device with the ftp://ftp.ensembl.org/pub/release-90/ as a mounted drive.
Parameters: - nrows – Number of rows of file to read. Useful for reading pieces of large files
- skiprows (list-like or integer or callable, default None) – Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. See pandas.read_csv() for more information about this parameter.
Returns: A list of PosixFilePaths of the PRIDE dataset.
Return_type: list
-
path2insight.datasets.external.
load_pride
(nrows=None, skiprows=None)¶ Load the filepaths of the PRIDE proteomics archive.
“The PRIDE PRoteomics IDEntifications (PRIDE) database is a centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. PRIDE is a core member in the ProteomeXchange (PX) consortium, which provides a single point for submitting mass spectrometry based proteomics data to public-domain repositories. Datasets are submitted to PRIDE via ProteomeXchange and are handled by expert biocurators. (https://www.ebi.ac.uk/pride/archive/)”
The filepaths from of this dataset are loaded with this function. The data can be found at ftp://ftp.pride.ebi.ac.uk/pride/data/archive/. The snapshot was taken on 06 february 2018 with a Linux device with the ftp://ftp.pride.ebi.ac.uk/pride/data/archive/ as a mounted drive.
Parameters: - nrows – Number of rows of file to read. Useful for reading pieces of large files
- skiprows (list-like or integer or callable, default None) – Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. See pandas.read_csv() for more information about this parameter.
Returns: A list of PosixFilePaths of the PRIDE dataset.
Return_type: list
Misc¶
-
path2insight.external.nltk.
bigrams
(sequence, **kwargs)¶ Return the bigrams generated from a sequence of items, as an iterator. For example:
>>> from path2insight.external.nltk import bigrams >>> list(bigrams([1,2,3,4,5])) [(1, 2), (2, 3), (3, 4), (4, 5)]
Use bigrams for a list version of this function.
Parameters: sequence (sequence or iter) – the source data to be converted into bigrams Return type: iter(tuple)
-
path2insight.external.nltk.
everygrams
(sequence, min_len=1, max_len=-1, **kwargs)¶ Returns all possible ngrams generated from a sequence of items, as an iterator.
>>> sent = 'a b c'.split() >>> list(everygrams(sent)) [('a',), ('b',), ('c',), ('a', 'b'), ('b', 'c'), ('a', 'b', 'c')] >>> list(everygrams(sent, max_len=2)) [('a',), ('b',), ('c',), ('a', 'b'), ('b', 'c')]
Parameters: Return type: iter(tuple)
-
path2insight.external.nltk.
ngrams
(sequence, n, pad_left=False, pad_right=False, left_pad_symbol=None, right_pad_symbol=None)¶ Return the ngrams generated from a sequence of items, as an iterator. For example:
>>> from path2insight.external.nltk import ngrams >>> list(ngrams([1,2,3,4,5], 3)) [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
Wrap with list for a list version of this function. Set pad_left or pad_right to true in order to get additional ngrams:
>>> list(ngrams([1,2,3,4,5], 2, pad_right=True)) [(1, 2), (2, 3), (3, 4), (4, 5), (5, None)] >>> list(ngrams([1,2,3,4,5], 2, pad_right=True, right_pad_symbol='</s>')) [(1, 2), (2, 3), (3, 4), (4, 5), (5, '</s>')] >>> list(ngrams([1,2,3,4,5], 2, pad_left=True, left_pad_symbol='<s>')) [('<s>', 1), (1, 2), (2, 3), (3, 4), (4, 5)] >>> list(ngrams([1,2,3,4,5], 2, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>')) [('<s>', 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, '</s>')]
Parameters: - sequence (sequence or iter) – the source data to be converted into ngrams
- n (int) – the degree of the ngrams
- pad_left (bool) – whether the ngrams should be left-padded
- pad_right (bool) – whether the ngrams should be right-padded
- left_pad_symbol (any) – the symbol to use for left padding (default is None)
- right_pad_symbol (any) – the symbol to use for right padding (default is None)
Return type: sequence or iter
-
path2insight.external.nltk.
pad_sequence
(sequence, n, pad_left=False, pad_right=False, left_pad_symbol=None, right_pad_symbol=None)¶ Returns a padded sequence of items before ngram extraction.
>>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>')) ['<s>', 1, 2, 3, 4, 5, '</s>'] >>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, left_pad_symbol='<s>')) ['<s>', 1, 2, 3, 4, 5] >>> list(pad_sequence([1,2,3,4,5], 2, pad_right=True, right_pad_symbol='</s>')) [1, 2, 3, 4, 5, '</s>']
Parameters: - sequence (sequence or iter) – the source data to be padded
- n (int) – the degree of the ngrams
- pad_left (bool) – whether the ngrams should be left-padded
- pad_right (bool) – whether the ngrams should be right-padded
- left_pad_symbol (any) – the symbol to use for left padding (default is None)
- right_pad_symbol (any) – the symbol to use for right padding (default is None)
Return type: sequence or iter
-
path2insight.external.nltk.
skipgrams
(sequence, n, k, **kwargs)¶ Returns all possible skipgrams generated from a sequence of items, as an iterator. Skipgrams are ngrams that allows tokens to be skipped. Refer to http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf
>>> sent = "Insurgents killed in ongoing fighting".split() >>> list(skipgrams(sent, 2, 2)) [('Insurgents', 'killed'), ('Insurgents', 'in'), ('Insurgents', 'ongoing'), ('killed', 'in'), ('killed', 'ongoing'), ('killed', 'fighting'), ('in', 'ongoing'), ('in', 'fighting'), ('ongoing', 'fighting')] >>> list(skipgrams(sent, 3, 2)) [('Insurgents', 'killed', 'in'), ('Insurgents', 'killed', 'ongoing'), ('Insurgents', 'killed', 'fighting'), ('Insurgents', 'in', 'ongoing'), ('Insurgents', 'in', 'fighting'), ('Insurgents', 'ongoing', 'fighting'), ('killed', 'in', 'ongoing'), ('killed', 'in', 'fighting'), ('killed', 'ongoing', 'fighting'), ('in', 'ongoing', 'fighting')]
Parameters: Return type: iter(tuple)
-
path2insight.external.nltk.
trigrams
(sequence, **kwargs)¶ Return the trigrams generated from a sequence of items, as an iterator. For example:
>>> from path2insight.external.nltk import trigrams >>> list(trigrams([1,2,3,4,5])) [(1, 2, 3), (2, 3, 4), (3, 4, 5)]
Use trigrams for a list version of this function.
Parameters: sequence (sequence or iter) – the source data to be converted into trigrams Return type: iter(tuple)