API Reference

FilePath Objects

class path2insight.WindowsFilePath(*args)

Object to analyse Windows file or folder path.

The WindowsFilePath inherits from pathlib.PureWindowsPath in Python >3.4. See the documentation for all properties and methods.


>>> p = WindowsFilePath("D://Documents/ProjectX/DEMO code.py")
>>> str(p)
'D:\Documents\ProjectX\DEMO code.py'
>>> p.lower_name().tokenize_stem()
['demo', 'code']
>>> p.extension

The concatenation of the drive and root, or ‘’.


Return the string representation of the path with forward (/) slashes.


Return the path as a ‘file’ URI.

capitalize(*args, **kwargs)

Apply string function ‘capitalize’ to filename parts.

capitalize_name(*args, **kwargs)

Apply string function ‘capitalize’ to the name.

capitalize_stem(*args, **kwargs)

Apply string function ‘capitalize’ to the stem.

casefold(*args, **kwargs)

Apply string function ‘casefold’ to filename parts.

casefold_name(*args, **kwargs)

Apply string function ‘casefold’ to the name.

casefold_stem(*args, **kwargs)

Apply string function ‘casefold’ to the stem.

center(*args, **kwargs)

Apply string function ‘center’ to filename parts.

center_name(*args, **kwargs)

Apply string function ‘center’ to the name.

center_stem(*args, **kwargs)

Apply string function ‘center’ to the stem.

count(*args, **kwargs)

Apply string function ‘count’ to filename parts.

count_name(*args, **kwargs)

Apply string function ‘count’ to the name.

count_stem(*args, **kwargs)

Apply string function ‘count’ to the stem.


Compute the depth of the path.

>>> WindowsFilePath('R:/Armel/path2insight/demo.py').depth

The drive prefix (letter or UNC path), if any.

encode(*args, **kwargs)

Apply string function ‘encode’ to filename parts.

encode_name(*args, **kwargs)

Apply string function ‘encode’ to the name.

encode_stem(*args, **kwargs)

Apply string function ‘encode’ to the stem.

endswith(*args, **kwargs)

Apply string function ‘endswith’ to filename parts.

endswith_name(*args, **kwargs)

Apply string function ‘endswith’ to the name.

endswith_stem(*args, **kwargs)

Apply string function ‘endswith’ to the stem.

expandtabs(*args, **kwargs)

Apply string function ‘expandtabs’ to filename parts.

expandtabs_name(*args, **kwargs)

Apply string function ‘expandtabs’ to the name.

expandtabs_stem(*args, **kwargs)

Apply string function ‘expandtabs’ to the stem.


Masked property from self.suffix


Masked property from self.suffixes

find(*args, **kwargs)

Apply string function ‘find’ to filename parts.

find_name(*args, **kwargs)

Apply string function ‘find’ to the name.

find_stem(*args, **kwargs)

Apply string function ‘find’ to the stem.

format(*args, **kwargs)

Apply string function ‘format’ to filename parts.

format_map(*args, **kwargs)

Apply string function ‘format_map’ to filename parts.

format_map_name(*args, **kwargs)

Apply string function ‘format_map’ to the name.

format_map_stem(*args, **kwargs)

Apply string function ‘format_map’ to the stem.

format_name(*args, **kwargs)

Apply string function ‘format’ to the name.

format_stem(*args, **kwargs)

Apply string function ‘format’ to the stem.

index(*args, **kwargs)

Apply string function ‘index’ to filename parts.

index_name(*args, **kwargs)

Apply string function ‘index’ to the name.

index_stem(*args, **kwargs)

Apply string function ‘index’ to the stem.


True if the path is absolute (has both a root and, if applicable, a drive).


Return True if the path contains one of the special names reserved by the system, if any.

isalnum(*args, **kwargs)

Apply string function ‘isalnum’ to filename parts.

isalnum_name(*args, **kwargs)

Apply string function ‘isalnum’ to the name.

isalnum_stem(*args, **kwargs)

Apply string function ‘isalnum’ to the stem.

isalpha(*args, **kwargs)

Apply string function ‘isalpha’ to filename parts.

isalpha_name(*args, **kwargs)

Apply string function ‘isalpha’ to the name.

isalpha_stem(*args, **kwargs)

Apply string function ‘isalpha’ to the stem.

isascii(*args, **kwargs)

Apply string function ‘isascii’ to filename parts.

isascii_name(*args, **kwargs)

Apply string function ‘isascii’ to the name.

isascii_stem(*args, **kwargs)

Apply string function ‘isascii’ to the stem.

isdecimal(*args, **kwargs)

Apply string function ‘isdecimal’ to filename parts.

isdecimal_name(*args, **kwargs)

Apply string function ‘isdecimal’ to the name.

isdecimal_stem(*args, **kwargs)

Apply string function ‘isdecimal’ to the stem.

isdigit(*args, **kwargs)

Apply string function ‘isdigit’ to filename parts.

isdigit_name(*args, **kwargs)

Apply string function ‘isdigit’ to the name.

isdigit_stem(*args, **kwargs)

Apply string function ‘isdigit’ to the stem.

isidentifier(*args, **kwargs)

Apply string function ‘isidentifier’ to filename parts.

isidentifier_name(*args, **kwargs)

Apply string function ‘isidentifier’ to the name.

isidentifier_stem(*args, **kwargs)

Apply string function ‘isidentifier’ to the stem.

islower(*args, **kwargs)

Apply string function ‘islower’ to filename parts.

islower_name(*args, **kwargs)

Apply string function ‘islower’ to the name.

islower_stem(*args, **kwargs)

Apply string function ‘islower’ to the stem.

isnumeric(*args, **kwargs)

Apply string function ‘isnumeric’ to filename parts.

isnumeric_name(*args, **kwargs)

Apply string function ‘isnumeric’ to the name.

isnumeric_stem(*args, **kwargs)

Apply string function ‘isnumeric’ to the stem.

isprintable(*args, **kwargs)

Apply string function ‘isprintable’ to filename parts.

isprintable_name(*args, **kwargs)

Apply string function ‘isprintable’ to the name.

isprintable_stem(*args, **kwargs)

Apply string function ‘isprintable’ to the stem.

isspace(*args, **kwargs)

Apply string function ‘isspace’ to filename parts.

isspace_name(*args, **kwargs)

Apply string function ‘isspace’ to the name.

isspace_stem(*args, **kwargs)

Apply string function ‘isspace’ to the stem.

istitle(*args, **kwargs)

Apply string function ‘istitle’ to filename parts.

istitle_name(*args, **kwargs)

Apply string function ‘istitle’ to the name.

istitle_stem(*args, **kwargs)

Apply string function ‘istitle’ to the stem.

isupper(*args, **kwargs)

Apply string function ‘isupper’ to filename parts.

isupper_name(*args, **kwargs)

Apply string function ‘isupper’ to the name.

isupper_stem(*args, **kwargs)

Apply string function ‘isupper’ to the stem.

join(*args, **kwargs)

Apply string function ‘join’ to filename parts.

join_name(*args, **kwargs)

Apply string function ‘join’ to the name.

join_stem(*args, **kwargs)

Apply string function ‘join’ to the stem.


Combine this path with one or several arguments, and return a new path representing either a subpath (if all arguments are relative paths) or a totally different path (if one of the arguments is anchored).

ljust(*args, **kwargs)

Apply string function ‘ljust’ to filename parts.

ljust_name(*args, **kwargs)

Apply string function ‘ljust’ to the name.

ljust_stem(*args, **kwargs)

Apply string function ‘ljust’ to the stem.

lower(*args, **kwargs)

Apply string function ‘lower’ to filename parts.

lower_name(*args, **kwargs)

Apply string function ‘lower’ to the name.

lower_stem(*args, **kwargs)

Apply string function ‘lower’ to the stem.

lstrip(*args, **kwargs)

Apply string function ‘lstrip’ to filename parts.

lstrip_name(*args, **kwargs)

Apply string function ‘lstrip’ to the name.

lstrip_stem(*args, **kwargs)

Apply string function ‘lstrip’ to the stem.

maketrans(*args, **kwargs)

Apply string function ‘maketrans’ to filename parts.

maketrans_name(*args, **kwargs)

Apply string function ‘maketrans’ to the name.

maketrans_stem(*args, **kwargs)

Apply string function ‘maketrans’ to the stem.


Return True if this path matches the given pattern.


The final path component, if any.


The logical parent of the path.


A sequence of this path’s logical parents.

partition(*args, **kwargs)

Apply string function ‘partition’ to filename parts.

partition_name(*args, **kwargs)

Apply string function ‘partition’ to the name.

partition_stem(*args, **kwargs)

Apply string function ‘partition’ to the stem.


An object providing sequence-like access to the components in the filesystem path.


Return the relative path to another path identified by the passed arguments. If the operation is not possible (because this is not a subpath of the other path), raise ValueError.

replace(*args, **kwargs)

Apply string function ‘replace’ to filename parts.

replace_name(*args, **kwargs)

Apply string function ‘replace’ to the name.

replace_stem(*args, **kwargs)

Apply string function ‘replace’ to the stem.

rfind(*args, **kwargs)

Apply string function ‘rfind’ to filename parts.

rfind_name(*args, **kwargs)

Apply string function ‘rfind’ to the name.

rfind_stem(*args, **kwargs)

Apply string function ‘rfind’ to the stem.

rindex(*args, **kwargs)

Apply string function ‘rindex’ to filename parts.

rindex_name(*args, **kwargs)

Apply string function ‘rindex’ to the name.

rindex_stem(*args, **kwargs)

Apply string function ‘rindex’ to the stem.

rjust(*args, **kwargs)

Apply string function ‘rjust’ to filename parts.

rjust_name(*args, **kwargs)

Apply string function ‘rjust’ to the name.

rjust_stem(*args, **kwargs)

Apply string function ‘rjust’ to the stem.


The root of the path, if any.

rpartition(*args, **kwargs)

Apply string function ‘rpartition’ to filename parts.

rpartition_name(*args, **kwargs)

Apply string function ‘rpartition’ to the name.

rpartition_stem(*args, **kwargs)

Apply string function ‘rpartition’ to the stem.

rsplit(*args, **kwargs)

Apply string function ‘rsplit’ to filename parts.

rsplit_name(*args, **kwargs)

Apply string function ‘rsplit’ to the name.

rsplit_stem(*args, **kwargs)

Apply string function ‘rsplit’ to the stem.

rstrip(*args, **kwargs)

Apply string function ‘rstrip’ to filename parts.

rstrip_name(*args, **kwargs)

Apply string function ‘rstrip’ to the name.

rstrip_stem(*args, **kwargs)

Apply string function ‘rstrip’ to the stem.

split(*args, **kwargs)

Apply string function ‘split’ to filename parts.

split_name(*args, **kwargs)

Apply string function ‘split’ to the name.

split_stem(*args, **kwargs)

Apply string function ‘split’ to the stem.

splitlines(*args, **kwargs)

Apply string function ‘splitlines’ to filename parts.

splitlines_name(*args, **kwargs)

Apply string function ‘splitlines’ to the name.

splitlines_stem(*args, **kwargs)

Apply string function ‘splitlines’ to the stem.

startswith(*args, **kwargs)

Apply string function ‘startswith’ to filename parts.

startswith_name(*args, **kwargs)

Apply string function ‘startswith’ to the name.

startswith_stem(*args, **kwargs)

Apply string function ‘startswith’ to the stem.


The final path component, minus its last suffix.

strip(*args, **kwargs)

Apply string function ‘strip’ to filename parts.

strip_name(*args, **kwargs)

Apply string function ‘strip’ to the name.

strip_stem(*args, **kwargs)

Apply string function ‘strip’ to the stem.


The final component’s last suffix, if any.


A list of the final component’s suffixes, if any.

swapcase(*args, **kwargs)

Apply string function ‘swapcase’ to filename parts.

swapcase_name(*args, **kwargs)

Apply string function ‘swapcase’ to the name.

swapcase_stem(*args, **kwargs)

Apply string function ‘swapcase’ to the stem.

title(*args, **kwargs)

Apply string function ‘title’ to filename parts.

title_name(*args, **kwargs)

Apply string function ‘title’ to the name.

title_stem(*args, **kwargs)

Apply string function ‘title’ to the stem.

tokenize(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)', exclude_extension=True)

Tokenise the name (without extension)


Tokenise the name


Tokenise the name

translate(*args, **kwargs)

Apply string function ‘translate’ to filename parts.

translate_name(*args, **kwargs)

Apply string function ‘translate’ to the name.

translate_stem(*args, **kwargs)

Apply string function ‘translate’ to the stem.

upper(*args, **kwargs)

Apply string function ‘upper’ to filename parts.

upper_name(*args, **kwargs)

Apply string function ‘upper’ to the name.

upper_stem(*args, **kwargs)

Apply string function ‘upper’ to the stem.


Return a new path with the file name changed.


Return a new path with the file suffix changed. If the path has no suffix, add given suffix. If the given suffix is an empty string, remove the suffix from the path.

zfill(*args, **kwargs)

Apply string function ‘zfill’ to filename parts.

zfill_name(*args, **kwargs)

Apply string function ‘zfill’ to the name.

zfill_stem(*args, **kwargs)

Apply string function ‘zfill’ to the stem.

class path2insight.PosixFilePath(*args)

Object to analyse Posix file or folder path.

The WindowsFilePath inherits from pathlib.PureWindowsPath in Python >3.4. See https://docs.python.org/3/library/pathlib.html#methods-and-properties for all properties and methods.


The concatenation of the drive and root, or ‘’.


Return the string representation of the path with forward (/) slashes.


Return the path as a ‘file’ URI.

capitalize(*args, **kwargs)

Apply string function ‘capitalize’ to filename parts.

capitalize_name(*args, **kwargs)

Apply string function ‘capitalize’ to the name.

capitalize_stem(*args, **kwargs)

Apply string function ‘capitalize’ to the stem.

casefold(*args, **kwargs)

Apply string function ‘casefold’ to filename parts.

casefold_name(*args, **kwargs)

Apply string function ‘casefold’ to the name.

casefold_stem(*args, **kwargs)

Apply string function ‘casefold’ to the stem.

center(*args, **kwargs)

Apply string function ‘center’ to filename parts.

center_name(*args, **kwargs)

Apply string function ‘center’ to the name.

center_stem(*args, **kwargs)

Apply string function ‘center’ to the stem.

count(*args, **kwargs)

Apply string function ‘count’ to filename parts.

count_name(*args, **kwargs)

Apply string function ‘count’ to the name.

count_stem(*args, **kwargs)

Apply string function ‘count’ to the stem.


Compute the depth of the path.

>>> WindowsFilePath('R:/Armel/path2insight/demo.py').depth

The drive prefix (letter or UNC path), if any.

encode(*args, **kwargs)

Apply string function ‘encode’ to filename parts.

encode_name(*args, **kwargs)

Apply string function ‘encode’ to the name.

encode_stem(*args, **kwargs)

Apply string function ‘encode’ to the stem.

endswith(*args, **kwargs)

Apply string function ‘endswith’ to filename parts.

endswith_name(*args, **kwargs)

Apply string function ‘endswith’ to the name.

endswith_stem(*args, **kwargs)

Apply string function ‘endswith’ to the stem.

expandtabs(*args, **kwargs)

Apply string function ‘expandtabs’ to filename parts.

expandtabs_name(*args, **kwargs)

Apply string function ‘expandtabs’ to the name.

expandtabs_stem(*args, **kwargs)

Apply string function ‘expandtabs’ to the stem.


Masked property from self.suffix


Masked property from self.suffixes

find(*args, **kwargs)

Apply string function ‘find’ to filename parts.

find_name(*args, **kwargs)

Apply string function ‘find’ to the name.

find_stem(*args, **kwargs)

Apply string function ‘find’ to the stem.

format(*args, **kwargs)

Apply string function ‘format’ to filename parts.

format_map(*args, **kwargs)

Apply string function ‘format_map’ to filename parts.

format_map_name(*args, **kwargs)

Apply string function ‘format_map’ to the name.

format_map_stem(*args, **kwargs)

Apply string function ‘format_map’ to the stem.

format_name(*args, **kwargs)

Apply string function ‘format’ to the name.

format_stem(*args, **kwargs)

Apply string function ‘format’ to the stem.

index(*args, **kwargs)

Apply string function ‘index’ to filename parts.

index_name(*args, **kwargs)

Apply string function ‘index’ to the name.

index_stem(*args, **kwargs)

Apply string function ‘index’ to the stem.


True if the path is absolute (has both a root and, if applicable, a drive).


Return True if the path contains one of the special names reserved by the system, if any.

isalnum(*args, **kwargs)

Apply string function ‘isalnum’ to filename parts.

isalnum_name(*args, **kwargs)

Apply string function ‘isalnum’ to the name.

isalnum_stem(*args, **kwargs)

Apply string function ‘isalnum’ to the stem.

isalpha(*args, **kwargs)

Apply string function ‘isalpha’ to filename parts.

isalpha_name(*args, **kwargs)

Apply string function ‘isalpha’ to the name.

isalpha_stem(*args, **kwargs)

Apply string function ‘isalpha’ to the stem.

isascii(*args, **kwargs)

Apply string function ‘isascii’ to filename parts.

isascii_name(*args, **kwargs)

Apply string function ‘isascii’ to the name.

isascii_stem(*args, **kwargs)

Apply string function ‘isascii’ to the stem.

isdecimal(*args, **kwargs)

Apply string function ‘isdecimal’ to filename parts.

isdecimal_name(*args, **kwargs)

Apply string function ‘isdecimal’ to the name.

isdecimal_stem(*args, **kwargs)

Apply string function ‘isdecimal’ to the stem.

isdigit(*args, **kwargs)

Apply string function ‘isdigit’ to filename parts.

isdigit_name(*args, **kwargs)

Apply string function ‘isdigit’ to the name.

isdigit_stem(*args, **kwargs)

Apply string function ‘isdigit’ to the stem.

isidentifier(*args, **kwargs)

Apply string function ‘isidentifier’ to filename parts.

isidentifier_name(*args, **kwargs)

Apply string function ‘isidentifier’ to the name.

isidentifier_stem(*args, **kwargs)

Apply string function ‘isidentifier’ to the stem.

islower(*args, **kwargs)

Apply string function ‘islower’ to filename parts.

islower_name(*args, **kwargs)

Apply string function ‘islower’ to the name.

islower_stem(*args, **kwargs)

Apply string function ‘islower’ to the stem.

isnumeric(*args, **kwargs)

Apply string function ‘isnumeric’ to filename parts.

isnumeric_name(*args, **kwargs)

Apply string function ‘isnumeric’ to the name.

isnumeric_stem(*args, **kwargs)

Apply string function ‘isnumeric’ to the stem.

isprintable(*args, **kwargs)

Apply string function ‘isprintable’ to filename parts.

isprintable_name(*args, **kwargs)

Apply string function ‘isprintable’ to the name.

isprintable_stem(*args, **kwargs)

Apply string function ‘isprintable’ to the stem.

isspace(*args, **kwargs)

Apply string function ‘isspace’ to filename parts.

isspace_name(*args, **kwargs)

Apply string function ‘isspace’ to the name.

isspace_stem(*args, **kwargs)

Apply string function ‘isspace’ to the stem.

istitle(*args, **kwargs)

Apply string function ‘istitle’ to filename parts.

istitle_name(*args, **kwargs)

Apply string function ‘istitle’ to the name.

istitle_stem(*args, **kwargs)

Apply string function ‘istitle’ to the stem.

isupper(*args, **kwargs)

Apply string function ‘isupper’ to filename parts.

isupper_name(*args, **kwargs)

Apply string function ‘isupper’ to the name.

isupper_stem(*args, **kwargs)

Apply string function ‘isupper’ to the stem.

join(*args, **kwargs)

Apply string function ‘join’ to filename parts.

join_name(*args, **kwargs)

Apply string function ‘join’ to the name.

join_stem(*args, **kwargs)

Apply string function ‘join’ to the stem.


Combine this path with one or several arguments, and return a new path representing either a subpath (if all arguments are relative paths) or a totally different path (if one of the arguments is anchored).

ljust(*args, **kwargs)

Apply string function ‘ljust’ to filename parts.

ljust_name(*args, **kwargs)

Apply string function ‘ljust’ to the name.

ljust_stem(*args, **kwargs)

Apply string function ‘ljust’ to the stem.

lower(*args, **kwargs)

Apply string function ‘lower’ to filename parts.

lower_name(*args, **kwargs)

Apply string function ‘lower’ to the name.

lower_stem(*args, **kwargs)

Apply string function ‘lower’ to the stem.

lstrip(*args, **kwargs)

Apply string function ‘lstrip’ to filename parts.

lstrip_name(*args, **kwargs)

Apply string function ‘lstrip’ to the name.

lstrip_stem(*args, **kwargs)

Apply string function ‘lstrip’ to the stem.

maketrans(*args, **kwargs)

Apply string function ‘maketrans’ to filename parts.

maketrans_name(*args, **kwargs)

Apply string function ‘maketrans’ to the name.

maketrans_stem(*args, **kwargs)

Apply string function ‘maketrans’ to the stem.


Return True if this path matches the given pattern.


The final path component, if any.


The logical parent of the path.


A sequence of this path’s logical parents.

partition(*args, **kwargs)

Apply string function ‘partition’ to filename parts.

partition_name(*args, **kwargs)

Apply string function ‘partition’ to the name.

partition_stem(*args, **kwargs)

Apply string function ‘partition’ to the stem.


An object providing sequence-like access to the components in the filesystem path.


Return the relative path to another path identified by the passed arguments. If the operation is not possible (because this is not a subpath of the other path), raise ValueError.

replace(*args, **kwargs)

Apply string function ‘replace’ to filename parts.

replace_name(*args, **kwargs)

Apply string function ‘replace’ to the name.

replace_stem(*args, **kwargs)

Apply string function ‘replace’ to the stem.

rfind(*args, **kwargs)

Apply string function ‘rfind’ to filename parts.

rfind_name(*args, **kwargs)

Apply string function ‘rfind’ to the name.

rfind_stem(*args, **kwargs)

Apply string function ‘rfind’ to the stem.

rindex(*args, **kwargs)

Apply string function ‘rindex’ to filename parts.

rindex_name(*args, **kwargs)

Apply string function ‘rindex’ to the name.

rindex_stem(*args, **kwargs)

Apply string function ‘rindex’ to the stem.

rjust(*args, **kwargs)

Apply string function ‘rjust’ to filename parts.

rjust_name(*args, **kwargs)

Apply string function ‘rjust’ to the name.

rjust_stem(*args, **kwargs)

Apply string function ‘rjust’ to the stem.


The root of the path, if any.

rpartition(*args, **kwargs)

Apply string function ‘rpartition’ to filename parts.

rpartition_name(*args, **kwargs)

Apply string function ‘rpartition’ to the name.

rpartition_stem(*args, **kwargs)

Apply string function ‘rpartition’ to the stem.

rsplit(*args, **kwargs)

Apply string function ‘rsplit’ to filename parts.

rsplit_name(*args, **kwargs)

Apply string function ‘rsplit’ to the name.

rsplit_stem(*args, **kwargs)

Apply string function ‘rsplit’ to the stem.

rstrip(*args, **kwargs)

Apply string function ‘rstrip’ to filename parts.

rstrip_name(*args, **kwargs)

Apply string function ‘rstrip’ to the name.

rstrip_stem(*args, **kwargs)

Apply string function ‘rstrip’ to the stem.

split(*args, **kwargs)

Apply string function ‘split’ to filename parts.

split_name(*args, **kwargs)

Apply string function ‘split’ to the name.

split_stem(*args, **kwargs)

Apply string function ‘split’ to the stem.

splitlines(*args, **kwargs)

Apply string function ‘splitlines’ to filename parts.

splitlines_name(*args, **kwargs)

Apply string function ‘splitlines’ to the name.

splitlines_stem(*args, **kwargs)

Apply string function ‘splitlines’ to the stem.

startswith(*args, **kwargs)

Apply string function ‘startswith’ to filename parts.

startswith_name(*args, **kwargs)

Apply string function ‘startswith’ to the name.

startswith_stem(*args, **kwargs)

Apply string function ‘startswith’ to the stem.


The final path component, minus its last suffix.

strip(*args, **kwargs)

Apply string function ‘strip’ to filename parts.

strip_name(*args, **kwargs)

Apply string function ‘strip’ to the name.

strip_stem(*args, **kwargs)

Apply string function ‘strip’ to the stem.


The final component’s last suffix, if any.


A list of the final component’s suffixes, if any.

swapcase(*args, **kwargs)

Apply string function ‘swapcase’ to filename parts.

swapcase_name(*args, **kwargs)

Apply string function ‘swapcase’ to the name.

swapcase_stem(*args, **kwargs)

Apply string function ‘swapcase’ to the stem.

title(*args, **kwargs)

Apply string function ‘title’ to filename parts.

title_name(*args, **kwargs)

Apply string function ‘title’ to the name.

title_stem(*args, **kwargs)

Apply string function ‘title’ to the stem.

tokenize(token_pattern='(?u)([a-zA-Z0-9\\:]+)(?=[^a-zA-Z0-9\\:]|$)', exclude_extension=True)

Tokenise the name (without extension)


Tokenise the name


Tokenise the name

translate(*args, **kwargs)

Apply string function ‘translate’ to filename parts.

translate_name(*args, **kwargs)

Apply string function ‘translate’ to the name.

translate_stem(*args, **kwargs)

Apply string function ‘translate’ to the stem.

upper(*args, **kwargs)

Apply string function ‘upper’ to filename parts.

upper_name(*args, **kwargs)

Apply string function ‘upper’ to the name.

upper_stem(*args, **kwargs)

Apply string function ‘upper’ to the stem.


Return a new path with the file name changed.


Return a new path with the file suffix changed. If the path has no suffix, add given suffix. If the given suffix is an empty string, remove the suffix from the path.

zfill(*args, **kwargs)

Apply string function ‘zfill’ to filename parts.

zfill_name(*args, **kwargs)

Apply string function ‘zfill’ to the name.

zfill_stem(*args, **kwargs)

Apply string function ‘zfill’ to the stem.


path2insight.parse(obj, os_name=None)

Parse (list of) file paths.

Parse a list with file paths into list of WindowsFilePath and PosixFilePath objects. This function can parse list, tuple, numpy.ndarray and pandas.Series. This is done with one of the following parsers: parse_from_pandas, parse_from_numpy or parse_from_list.

>>> data = ['file1.xml', 'data/file1.txt', 'data/file2.txt']
>>> path2insight.parse(data, os_name='windows')

gives the same result as

>>> import pandas
>>> path2insight.parse(pandas.Series(data), os_name='windows')
  • obj (list, tuple, numpy.ndarray, pandas.Series) – An object such as a list, numpy array or pandas.Series with filepaths in the form of strings.
  • os_name (str) – The operation system on with the filepaths are collected. The options are ‘windows’ or ‘posix’ for Windows and Posix system repectivily.

Returns a list with WindowsFilePaths and PosixFilePaths.



path2insight.parse_from_list(l, os_name=None)

Parse a list with file paths.

See path2insight.parse() for additional information.

  • obj (list) – A list with filepaths in the form of strings.
  • os_name (str) – The operation system on with the filepaths are collected. The options are ‘windows’ or ‘posix’ for Windows and Posix system repectivily.

Returns a list with WindowsFilePaths and PosixFilePaths.



path2insight.parse_from_numpy(np_object, os_name=None)

Parse a numpy array with file paths.

See path2insight.parse() for additional information.

  • obj (numpy.ndarray) – A numpy.ndarray with filepaths in the form of strings.
  • os_name (str) – The operation system on with the filepaths are collected. The options are ‘windows’ or ‘posix’ for Windows and Posix system repectivily.

Returns a list with WindowsFilePaths and PosixFilePaths.



path2insight.parse_from_pandas(pandas_object, os_name=None)

Parse a series or dataframe with file paths.

See path2insight.parse() for additional information.

  • obj (pandas.Series or pandas.DataFrame) – An pandas.Series or pandas.DataFrame with filepaths in the form of strings.
  • os_name (str) – The operation system on with the filepaths are collected. The options are ‘windows’ or ‘posix’ for Windows and Posix system repectivily.

Returns a list with WindowsFilePaths and PosixFilePaths.




path2insight.sort(paths, level=None, reverse=False)

Sort a list of filepaths.

This function sorts a list of filepaths. The sorting can be based on parts of the (like folder of file) names. This is done with the key arguments.

  • paths (list) – A list of filepaths
  • level ((list of) int) – List of positions which refer to the axis items. Default None.
  • reverse (bool) – Reverse the sort.

A sorted list.



>>> path2insight.sort(data)
>>> path2insight.sort(data, key=1)
>>> path2insight.sort(data, key=1, reverse=True)
>>> path2insight.sort(data, key=[5, 4])
path2insight.sample(data, n=None)

Take a random sample of filepaths.

  • paths (list) – A list of filepaths
  • n (int, optional) – The number of filepaths to return. If None, all filepaths are returned in a random order. Default None.

A list with a sample of filepath.



path2insight.select(paths, **kwargs)

Select from a list of filepaths.

This function selects from a list of filepaths based on their part (like folder of file) names. This is done with the level arguments.

  • paths (list) – A list of filepaths
  • level0 ((list of) str) – The value(s) of the first level (root).
  • level1 ((list of) str) – The value(s) of the second level.
  • level* ((list of) str) – The value(s) of the nth level.

A list with the selection of matching filepath.




One can use the value “*” to select all file paths. If a file path doesn’t have a value on that level (because the level is higher than the number of parts), then the path is excluded from the selection. One can also use True instead of “*”.


Selection based on the name of a level.

>>> import path2insight
>>> data = [path2insight.WindowsFilePath("F:/data/file.txt"),
>>> path2insight.select(data, level1='data')

Selection based on the existence of a level (wilcard). Path is only included when level exists.

>>> path2insight.select(data, level2='*')
path2insight.select_re(paths, **kwargs)

Select from a list of filepaths.

This function selects from a list of filepaths based on their part (like folder of file) names. This is done with the level arguments.

  • paths (list) – A list of filepaths
  • level0 ((list of) str) – The value(s) of the first level (root).
  • level1 ((list of) str) – The value(s) of the second level.
  • level* ((list of) str) – The value(s) of the nth level.

A list with the selection of matching filepath.




Selection based on the name of a level.

>>> import path2insight
>>> data = [path2insight.WindowsFilePath("F:/data/file.txt"),
>>> path2insight.select(data, level1=r"[A-Z]")

Select all paths starting with the letter d on the first level.

>>> path2insight.select(data, level1=r"^d")


path2insight.explore.stats.depth_counts(x, normalize=False, center=None)

Count the filepath-depths.

This function counts the filepath depths of a list of filepaths. The function returns a Python collections.Counter object. This Counter object can be used to compute the most common depths or substract other Counter objects. For all options, see the Python documentation.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to determine the depth of.
  • normalize (bool) – Normalize the Counter result. Default False.
  • center (str, NoneType, callable) – Method to correct the offset of the data. Options are ‘mean’ or callable. Default None.

filepath depths counted

Return type:


>>> path2insight.depth_counts(list_of_filepaths)
Counter({5: 32, 6: 654, 7: 284, 8: 13, 9: 11, 10: 1, 11: 4, 13: 1})

To get a Python dict, simply wrap the Counter object with dict().

path2insight.explore.stats.drive_counts(x, lower=False, normalize=False)

Count the drives of the paths.

This function counts the drives of a list of filepaths. The function returns a Python collections.Counter object. This Counter object can be used to compute the most common drives or substract other Counter objects. For all options, see the Python documentation.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to count the stems of.
  • lower (boolean) – Convert the drive to lower before counting.
  • normalize (bool) – Normalize the Counter result. Default False.

drives counted

Return type:



To get a Python dict, simply wrap the Counter object with dict().

path2insight.explore.stats.extension_chisquare(x, y=None, lower=True)

Calculates a one-way chi square test for file extensions.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to compare with y.
  • y (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to compare with x.
  • lower (boolean) – Convert the extensions to lower before counting.

The test result.

Return type:


path2insight.explore.stats.extension_counts(x, lower=False, normalize=False)

Count the extensions of the filenames.

This function counts the name extensions of a list of filepaths. The function returns a Python collections.Counter object. This Counter object can be used to compute the most common extensions or substract other Counter objects. For all options, see the Python documentation.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to determine the extension count of.
  • lower (boolean) – Convert the extensions to lower before counting.
  • normalize (bool) – Normalize the Counter result. Default False.

extensions counted

Return type:


>>> path2insight.extension_counts(filepaths_list)
Counter({'.zip': 42, '.raw': 3, '.txt': 12, '.docx': 1})
>>> path2insight.extension_counts(filepaths_list).most_common(3)
[('.zip', 42), ('.txt', 12), ('.raw', 3)]

To get a Python dict, simply wrap the Counter object with dict().


[CHANGE FUNCTION NAME]Count the number of extensions.

path2insight.explore.stats.name_chisquare(x, y=None, lower=True)

Calculates a one-way chi square test for file names.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to compare with y.
  • y (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to compare with x.
  • lower (boolean) – Convert the extensions to lower before counting.

The test result.

Return type:


path2insight.explore.stats.name_counts(x, lower=False, normalize=False)

Count the names.

This function counts the names of a list of filepaths. The function returns a Python collections.Counter object. This Counter object can be used to compute the most common names or substract other Counter objects. For all options, see the Python documentation.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to count the names of.
  • lower (boolean) – Convert the filenames to lower before counting.
  • normalize (bool) – Normalize the Counter result. Default False.

names counted

Return type:



To get a Python dict, simply wrap the Counter object with dict().

path2insight.explore.stats.stem_chisquare(x, y=None, lower=True)

Calculates a one-way chi square test for file name stems.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to compare with y.
  • y (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to compare with x.
  • lower (boolean) – Convert the extensions to lower before counting.

The test result.

Return type:


path2insight.explore.stats.stem_counts(x, lower=False, normalize=False)

Count the stems.

This function counts the stems of a list of filepaths. The function returns a Python collections.Counter object. This Counter object can be used to compute the most common stems or substract other Counter objects. For all options, see the Python documentation.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to count the stems of.
  • lower (boolean) – Convert the stems to lower before counting.
  • normalize (bool) – Normalize the Counter result. Default False.

stems counted

Return type:



To get a Python dict, simply wrap the Counter object with dict().

path2insight.explore.stats.token_counts(x, tokenizer=<function default_tokenizer>, lower=False, parents=False, stem=True, extension=False, normalize=False)

Count the tokens in the paths.

This function counts the tokens of a list of filepaths. Use boolean settings to include the parents, stem and extension. The function returns a Python collections.Counter object. This Counter object can be used to compute the most common tokens or substract other Counter objects. For all options, see the Python documentation.

  • x (list, tuple, array of WindowsFilePath or PosixFilePath objects) – Paths to count the stems of.
  • parents (bool) – tokenize the parents
  • stem (bool) – tokenize the stem
  • extension (bool) – tokenisze the extension
  • lower (boolean) – Convert the filepath to lower before counting.
  • normalize (bool) – Normalize the Counter result. Default False.

drives counted

Return type:


>>> path2insight.token_counts(data).most_common(3)
[('FUNC001', 288), ('LTQ', 173), ('FUNCTNS', 96)]

To get a Python dict, simply wrap the Counter object with dict().

path2insight.explore.metrics.distance_on_depth(x, y=None, metric='l2', n_jobs=1)

Compute the distance between filenames based on the depth.

The distance between filenames is computed based on the difference in depth between the filenames.

  • x (list) – A list of filepath objects
  • y (list) – A list of filepath pbjects to compare x with. If y is None, the internal similarity of the filepaths in x are computed.
  • metric (string, or callable) – The distance metric like ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html for all possible metrics. Default ‘l2’.
  • n_jobs (int) – The number of cores to use during the computation of the metric. Default 1.
>>> from path2insight.explore import distance_on_depth
>>> import seaborn as sns
>>> d = distance_on_depth(DATASET)
>>> sns.heatmap(d)

For visual inspection, the heatmap function in seaborn can be useful.

path2insight.explore.metrics.distance_on_extension(x, y=None, tokenizer=None, metric='l2', n_jobs=1)

Compute the distance between filenames based on the extension.

The distance between filenames is computed based on the number of extensions that both filenames have in common.

  • x (list) – A list of filepath objects
  • y (list) – A list of filepath pbjects to compare x with. If y is None, the internal similarity of the filepaths in x are computed.
  • metric (string, or callable) – The distance metric like ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html for all possible metrics. Default ‘l2’.
  • n_jobs (int) – The number of cores to use during the computation of the metric. Default 1.
>>> from path2insight.explore import distance_on_extension
>>> import seaborn as sns
>>> d = distance_on_extension(DATASET)
>>> sns.heatmap(d)

For visual inspection, the heatmap function in seaborn can be useful.

path2insight.explore.metrics.distance_on_token(x, y=None, tokenizer=None, metric='l2', n_jobs=1)

Compute the distance between filenames based on tokens.

The distance between filenames is computed based on the number of tokens that both filenames have in common.

  • x (list) – A list of filepath objects
  • y (list) – A list of filepath pbjects to compare x with. If y is None, the internal similarity of the filepaths in x are computed.
  • tokenizer (callable) – Not implemented yet.
  • metric (string, or callable) – The distance metric like ‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’. See http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.pairwise_distances.html for all possible metrics. Default ‘l2’.
  • n_jobs (int) – The number of cores to use during the computation of the metric. Default 1.
>>> from path2insight.explore import distance_on_token
>>> import seaborn as sns
>>> d = distance_on_token(DATASET)
>>> sns.heatmap(d)

For visual inspection, the heatmap function in seaborn can be useful.



Make tokens from camelCase strings

Parameters:x (WindowsFilePath, PosixFilePath, str) – The filepath of string.
Returns:list of tokens (strings)

Make tokens of a file path or string.

Parameters:x (WindowsFilePath, PosixFilePath, str) – The filepath of string.
Returns:list of tokens (strings)

Make parts of a file path.

Parameters:x (WindowsFilePath, PosixFilePath, str) – The filepath .
Returns:list of path parts (strings)

Make tokens from title formatted strings

Parameters:x (WindowsFilePath, PosixFilePath, str) – The filepath of string.
Returns:list of tokens (strings)


This module implements a filepath tagger. The object structure of this filepath tagger is based on the tagger objects in the Natural Language Toolkit (NLTK).

class path2insight.explore.tagger.BaseTypeTagger(tokenizer=None, tag_names=None)

The base class for the type taggers.

  • tokenizer (list) – A tokenizer function to split the filepath parts.
  • tag_names (list) – The names of the four tags that this tagger uses. The tags default tags are drive=”DRV”, folder=”FLD”, stem=”STM” and extension=”EXT”.

Return a list with the parts/tokens and their tags.

Parameters:x (list) – A list with WindowsFilePath and PosixFilePath objects.
Returns:A list of lists for which each nested list is a 2-tuple of name and tag.
class path2insight.explore.tagger.CompressionTagger(tags=..., na_tag='', ignore_case=True, use_wildcards=True)

Extension tagger for compression and archiving.

This tagger tags compressed file paths based on their extension. There are three different types of tags in this tagger. The tags are:

  • tags (dict) – A dict with the extensions to tag. The keys of the dict are the tags and the values of the dict are lists with extensions.
  • na_tag (str, None, object) – The tag for an extension that is not in the tags dictonairy. Default ‘’.

bool Case-insensitive extension tagging. Default False.


bool Use Unix shell-style wildcards like * and ?. Default True.


Return a list with the extension tag for each file path.

Parameters:x (list) – A list with WindowsFilePath and PosixFilePath objects.
Returns:A list with the tag(s) for each filepath.
class path2insight.explore.tagger.DocumentTagger(tags=..., na_tag='', ignore_case=True, use_wildcards=True)

Extension tagger for documents.

This tagger tags compressed file paths based on their extension. There are three different types of tags in this tagger. The tags are:

  • tags (dict) – A dict with the extensions to tag. The keys of the dict are the tags and the values of the dict are lists with extensions.
  • na_tag (str, None, object) – The tag for an extension that is not in the tags dictonairy. Default ‘’.

bool Case-insensitive extension tagging. Default False.


bool Use Unix shell-style wildcards like * and ?. Default True.


Return a list with the extension tag for each file path.

Parameters:x (list) – A list with WindowsFilePath and PosixFilePath objects.
Returns:A list with the tag(s) for each filepath.
class path2insight.explore.tagger.ExtensionTagger(tags={}, na_tag='', ignore_case=False, use_wildcards=True)

Extension tagger based on dict of tags.

Unix shell-style wildcards like * and ? are supported.

  • tags (dict) – A dict with the extensions to tag. The keys of the dict are the tags and the values of the dict are lists with extensions.
  • na_tag (str, None, object) – The tag for an extension that is not in the tags dictonairy. Default ‘’.

bool Case-insensitive extension tagging. Default False.


bool Use Unix shell-style wildcards like * and ?. Default True.


Use an OrderedDict in case of order prevelence.


Return a list with the extension tag for each file path.

Parameters:x (list) – A list with WindowsFilePath and PosixFilePath objects.
Returns:A list with the tag(s) for each filepath.
class path2insight.explore.tagger.FolderTagger

[EXPERIMENTAL] A tagger that assigns a FOLDER or FILE tag to each path.

>>> from path2insight.explore import FolderTagger
>>> folder_tagger = FolderTagger()
>>> list(folder_tagger.tag([WindowsFilePath('D:/armel/file.xyz')])
[(WindowsFilePath('D:/armel/file.xyz'), 'FILE')]
class path2insight.explore.tagger.Tagger

Base class for the taggers.

class path2insight.explore.tagger.TokenTypeTagger(tokenizer=<function default_tokenizer>, tag_names=None)

A tagger that tags each filepath part (and extension) with the following labels: drive (DRV), folder (FLD), stem (STEM) and extension (EXT).

  • tokenizer (callable) – A function that converts a filepath or string into tokens.
  • tag_names (list) – The names of the four tags that this tagger uses. The tags default tags are drive=”DRV”, folder=”FLD”, stem=”STM” and extension=”EXT”.

Return a list with the parts/tokens and their tags.

Parameters:x (list) – A list with WindowsFilePath and PosixFilePath objects.
Returns:A list of lists for which each nested list is a 2-tuple of name and tag.
class path2insight.explore.tagger.TypeTagger(tag_names=None)

A tagger that tags each filepath part (and extension) with the following labels: drive (DRV), folder (FLD), stem (STEM) and extension (EXT).

Parameters:tag_names (list) – The names of the four tags that this tagger uses. The tags default tags are drive=”DRV”, folder=”FLD”, stem=”STM” and extension=”EXT”.

Return a list with the parts/tokens and their tags.

Parameters:x (list) – A list with WindowsFilePath and PosixFilePath objects.
Returns:A list of lists for which each nested list is a 2-tuple of name and tag.



path2insight.collect.walk(d, delay=None, **kwargs)

Walk the file system like os.walk.

Function to collect file paths from the file system. This function is useful for collecting and sharing the file paths. The function is similar to os.walk.

  • d (str) – The path to the directory.
  • delay ((list of) str) – Time delay between requesting paths.
  • kwargs – Additional kwargs for os.walk.

Return the file paths and folder paths. The function returns a tuples with (files, folders)


(list, list)


Collect and share filepaths with pandas.

>>> import pandas as pd
>>> import path2insight
>>> files, folders = path2insight.walk('.')
>>> pd.DataFrame(files).to_csv("export_filepaths.csv", index=False)


Path2Insight comes with several datasets. These datasets are public and real datasets. The datasets are available through the submodule ‘datasets’. See the example below:

from path2insight.datasets import load_pride
path2insight.datasets.external.load_ensembl(nrows=None, skiprows=None)

Load the filepaths of the Ensembl dataset (release 90).

“Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species. (ensembl.org)”

The filepaths from release-90 of this dataset are loaded with this function. The data can be found at ftp://ftp.ensembl.org/pub/release-90/. The snapshot was taken on 16 November 2017 with a Linux device with the ftp://ftp.ensembl.org/pub/release-90/ as a mounted drive.

  • nrows – Number of rows of file to read. Useful for reading pieces of large files
  • skiprows (list-like or integer or callable, default None) – Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. See pandas.read_csv() for more information about this parameter.

A list of PosixFilePaths of the PRIDE dataset.



path2insight.datasets.external.load_pride(nrows=None, skiprows=None)

Load the filepaths of the PRIDE proteomics archive.

“The PRIDE PRoteomics IDEntifications (PRIDE) database is a centralized, standards compliant, public data repository for proteomics data, including protein and peptide identifications, post-translational modifications and supporting spectral evidence. PRIDE is a core member in the ProteomeXchange (PX) consortium, which provides a single point for submitting mass spectrometry based proteomics data to public-domain repositories. Datasets are submitted to PRIDE via ProteomeXchange and are handled by expert biocurators. (https://www.ebi.ac.uk/pride/archive/)”

The filepaths from of this dataset are loaded with this function. The data can be found at ftp://ftp.pride.ebi.ac.uk/pride/data/archive/. The snapshot was taken on 06 february 2018 with a Linux device with the ftp://ftp.pride.ebi.ac.uk/pride/data/archive/ as a mounted drive.

  • nrows – Number of rows of file to read. Useful for reading pieces of large files
  • skiprows (list-like or integer or callable, default None) – Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. See pandas.read_csv() for more information about this parameter.

A list of PosixFilePaths of the PRIDE dataset.




path2insight.external.nltk.bigrams(sequence, **kwargs)

Return the bigrams generated from a sequence of items, as an iterator. For example:

>>> from path2insight.external.nltk import bigrams
>>> list(bigrams([1,2,3,4,5]))
[(1, 2), (2, 3), (3, 4), (4, 5)]

Use bigrams for a list version of this function.

Parameters:sequence (sequence or iter) – the source data to be converted into bigrams
Return type:iter(tuple)
path2insight.external.nltk.everygrams(sequence, min_len=1, max_len=-1, **kwargs)

Returns all possible ngrams generated from a sequence of items, as an iterator.

>>> sent = 'a b c'.split()
>>> list(everygrams(sent))
[('a',), ('b',), ('c',), ('a', 'b'), ('b', 'c'), ('a', 'b', 'c')]
>>> list(everygrams(sent, max_len=2))
[('a',), ('b',), ('c',), ('a', 'b'), ('b', 'c')]
  • sequence (sequence or iter) – the source data to be converted into trigrams
  • min_len (int) – minimum length of the ngrams, aka. n-gram order/degree of ngram
  • max_len (int) – maximum length of the ngrams (set to length of sequence by default)
Return type:


path2insight.external.nltk.ngrams(sequence, n, pad_left=False, pad_right=False, left_pad_symbol=None, right_pad_symbol=None)

Return the ngrams generated from a sequence of items, as an iterator. For example:

>>> from path2insight.external.nltk import ngrams
>>> list(ngrams([1,2,3,4,5], 3))
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Wrap with list for a list version of this function. Set pad_left or pad_right to true in order to get additional ngrams:

>>> list(ngrams([1,2,3,4,5], 2, pad_right=True))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, None)]
>>> list(ngrams([1,2,3,4,5], 2, pad_right=True, right_pad_symbol='</s>'))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, '</s>')]
>>> list(ngrams([1,2,3,4,5], 2, pad_left=True, left_pad_symbol='<s>'))
[('<s>', 1), (1, 2), (2, 3), (3, 4), (4, 5)]
>>> list(ngrams([1,2,3,4,5], 2, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>'))
[('<s>', 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, '</s>')]
  • sequence (sequence or iter) – the source data to be converted into ngrams
  • n (int) – the degree of the ngrams
  • pad_left (bool) – whether the ngrams should be left-padded
  • pad_right (bool) – whether the ngrams should be right-padded
  • left_pad_symbol (any) – the symbol to use for left padding (default is None)
  • right_pad_symbol (any) – the symbol to use for right padding (default is None)
Return type:

sequence or iter

path2insight.external.nltk.pad_sequence(sequence, n, pad_left=False, pad_right=False, left_pad_symbol=None, right_pad_symbol=None)

Returns a padded sequence of items before ngram extraction.

>>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, pad_right=True, left_pad_symbol='<s>', right_pad_symbol='</s>'))
['<s>', 1, 2, 3, 4, 5, '</s>']
>>> list(pad_sequence([1,2,3,4,5], 2, pad_left=True, left_pad_symbol='<s>'))
['<s>', 1, 2, 3, 4, 5]
>>> list(pad_sequence([1,2,3,4,5], 2, pad_right=True, right_pad_symbol='</s>'))
[1, 2, 3, 4, 5, '</s>']
  • sequence (sequence or iter) – the source data to be padded
  • n (int) – the degree of the ngrams
  • pad_left (bool) – whether the ngrams should be left-padded
  • pad_right (bool) – whether the ngrams should be right-padded
  • left_pad_symbol (any) – the symbol to use for left padding (default is None)
  • right_pad_symbol (any) – the symbol to use for right padding (default is None)
Return type:

sequence or iter

path2insight.external.nltk.skipgrams(sequence, n, k, **kwargs)

Returns all possible skipgrams generated from a sequence of items, as an iterator. Skipgrams are ngrams that allows tokens to be skipped. Refer to http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf

>>> sent = "Insurgents killed in ongoing fighting".split()
>>> list(skipgrams(sent, 2, 2))
[('Insurgents', 'killed'), ('Insurgents', 'in'), ('Insurgents', 'ongoing'), ('killed', 'in'), ('killed', 'ongoing'), ('killed', 'fighting'), ('in', 'ongoing'), ('in', 'fighting'), ('ongoing', 'fighting')]
>>> list(skipgrams(sent, 3, 2))
[('Insurgents', 'killed', 'in'), ('Insurgents', 'killed', 'ongoing'), ('Insurgents', 'killed', 'fighting'), ('Insurgents', 'in', 'ongoing'), ('Insurgents', 'in', 'fighting'), ('Insurgents', 'ongoing', 'fighting'), ('killed', 'in', 'ongoing'), ('killed', 'in', 'fighting'), ('killed', 'ongoing', 'fighting'), ('in', 'ongoing', 'fighting')]
  • sequence (sequence or iter) – the source data to be converted into trigrams
  • n (int) – the degree of the ngrams
  • k (int) – the skip distance
Return type:


path2insight.external.nltk.trigrams(sequence, **kwargs)

Return the trigrams generated from a sequence of items, as an iterator. For example:

>>> from path2insight.external.nltk import trigrams
>>> list(trigrams([1,2,3,4,5]))
[(1, 2, 3), (2, 3, 4), (3, 4, 5)]

Use trigrams for a list version of this function.

Parameters:sequence (sequence or iter) – the source data to be converted into trigrams
Return type:iter(tuple)