Common API Reference#

Strings and encodings#

dlisio.common.set_encodings(encodings)#

Set codepages to use for decoding strings

RP66 specifies that all strings should be in ASCII, meaning 7-bit. Strings in ASCII have identical bitwise representation in UTF-8, and python strings are in UTF-8. However, a lot of files contain strings that aren’t ASCII, but are encoded in some way - a common is the degree symbol [1]. Plenty of files use other encodings too.

LIS does not explicitly mention that strings should be ASCII, but it also doesn’t mention any encodings.

This function sets the code pages that dlisio will try in order when decoding the string-types specified by LIS and DLIS. UTF-8 will always be tried first, and is always correct if the file behaves according to spec.

Available encodings can be found in the Python docs [2].

If none of the encodings succeed, all strings will be returned as a bytes object.

Parameters:

encodings (list of str) – Ordered list of encodings to try

Warns:

UnicodeWarning – When no decode was successful, and a bytes object is returned

Warning

There is no place in the LIS or DLIS spec to put or look for encoding information, decoding is a wild guess. Plenty of strings are valid in multiple encodings, so there’s a high chance that decoding with the wrong encoding will give a valid string, but not the one the writer intended.

Warning

It is possible to change the encodings at any time. However, only strings created after the change will use the new encoding. Having strings that are out of sync w.r.t encodings might lead to unexpected behaviour. It is recommended that the file is reloaded after changing the encodings to ensure that all strings use the same encoding.

See also

get_encodings

currently set encodings

Notes

Strings are decoded using Python’s bytes.decode(errors = ‘strict’).

References

Examples

Decoding of the same string under different encodings

>>> from dlisio import dlis, common
>>> common.set_encodings([])
>>> with dlis.load('file.dlis') as (f, *_):
...     print(getchannel(f).units)
b'custom unit\xb0'
>>> common.set_encodings(['latin1'])
>>> with dlis.load('file.dlis') as (f, *_):
...     print(getchannel(f).units)
'custom unit°'
>>> common.set_encodings(['utf-16'])
>>> with dlis.load('file.dlis') as (f, *_):
...     print(getchannel(f).units)
'畣瑳浯甠楮끴'
dlisio.common.get_encodings()#

Get codepages to use for decoding strings

Get the currently set codepages used when decoding strings.

Returns:

encodings

Return type:

list

See also

set_encodings

Open#

dlisio.common.open(path, offset=0)#

Open a file

Open a low-level file handle. This is not intended for end-users - rather, it’s an escape hatch for very broken files that dlisio cannot handle.

Parameters:
  • path (str_like) –

  • offset (int) – Physical file offset at which handle must be opened

Returns:

stream

Return type:

dlisio.core.stream

Error handling#

class dlisio.common.ErrorHandler#

Defines rules about error handling

Many .dlis files happen to be not compliant with specification or simply broken. This class gives user some control over handling of such files.

When dlisio encounters a specification violation, it categories the issue based on the severity of the violation. Some issues are easy to ignore while other might force dlisio to give up on its current task. ErrorHandler supplies an interface for changing how dlisio reacts to different violation in the file.

Different categories are info, minor, major and critical:

Severity

Description

critical

Any issue that forces dlisio stop its current objective prematurely is categorised as critical.

By default a critical error raises a RuntimeError.

An example would be file indexing, which happens at load. Suppose the indexing fails midways through the file. There is no way for dlisio to reliably keep indexing the file. However, it is likely that the file is readable up until the point of failure. Changing the behaviour of critical from raising an Exception to logging would in this case mean that a partially indexed file is returned by load.

major

Result of a direct specification violation in the file. dlisio makes an assumption about what broken information [1] should have been and continues parsing the file on this assumption. If no other major or critical issues are reported, it’s likely that assumption was correct and that dlisio parsed the file correctly. However, no guarantees can be made.

By default a warning is logged.

[1] Note that “information” in this case refers to the data in the file that tells dlisio how the file should be parsed, not to the actual parsed data.

minor

Like Major issues, this is also a result of a direct specification violation. dlisio makes similar assumptions to keep parsing the file. Minor issues are generally less severe and, in contrast to major issues, are more likely to be handled correctly. However, still no guarantees can be made about the parsed data.

By default an info message is logged.

info

Issue doesn’t contradict specification, but situation is peculiar.

By default a debug message is logged.

ErrorHandler only applies to issues related to parsing information from the file. These are issues that otherwise would force dlisio to fail, such as direct violations of the RP66v1 specification. It does not apply to inconsistencies and issues in the parsed data. This means that cases where dlisio enforces behaviour of the parsed data, such as object-to-object references, are out of scope for the ErrorHandler.

Please also note that ErrorHandler doesn’t redefine issues categories, it only changes default behavior.

info#

Action for merely information message

minor#

Action for minor specification violation

major#

Action for major specification violation

critical#

Action for critical specification violation

Warning

Escaping errors is a good solution when user needs to read as much data as possible, for example, to have a general overview over the file. However user must be careful when using this mode during close inspection. If user decides to accept errors, they must be aware that some returned data will be spoiled. Most likely it will be data which is stored in the file near the failure.

Warning

Be careful not to ignore too much information when investigating files. If you want to debug a broken part of the file, you should look at all issues to get a full picture of the situation.

Examples

Define your own rules:

>>> from dlisio.common import ErrorHandler, Actions
>>> def myhandler(msg):
...     logging.getLogger('custom').info("error in dlisio")
...     raise RuntimeError("Custom handler: " + msg)
>>> errorhandler = ErrorHandler(
...     info     = Actions.SWALLOW,
...     minor    = Actions.LOG_WARNING,
...     major    = Actions.RAISE,
...     critical = myhandler)

Parse a file:

>>> from dlisio import dlis
>>> files = dlis.load(path)
RuntimeError: "...."
>>> handler = ErrorHandler(critical=Actions.LOG_ERROR)
>>> files = dlis.load(path, error_handler=handler)
[ERROR] "...."
>>> for f in files:
...  pass
class dlisio.common.Actions#

Actions available for various specification violations

LOG_DEBUG()#

logging.debug

LOG_ERROR()#

logging.error

LOG_INFO()#

logging.info

LOG_WARNING()#

logging.warning

RAISE()#

raise RuntimeError

SWALLOW()#

pass