Common API Reference#
Strings and encodings#
- dlisio.common.set_encodings(encodings)#
Set codepages to use for decoding strings
RP66 specifies that all strings should be in ASCII, meaning 7-bit. Strings in ASCII have identical bitwise representation in UTF-8, and python strings are in UTF-8. However, a lot of files contain strings that aren’t ASCII, but are encoded in some way - a common is the degree symbol [1]. Plenty of files use other encodings too.
LIS does not explicitly mention that strings should be ASCII, but it also doesn’t mention any encodings.
This function sets the code pages that dlisio will try in order when decoding the string-types specified by LIS and DLIS. UTF-8 will always be tried first, and is always correct if the file behaves according to spec.
Available encodings can be found in the Python docs [2].
If none of the encodings succeed, all strings will be returned as a bytes object.
- Parameters:
encodings (list of str) – Ordered list of encodings to try
- Warns:
UnicodeWarning – When no decode was successful, and a bytes object is returned
Warning
There is no place in the LIS or DLIS spec to put or look for encoding information, decoding is a wild guess. Plenty of strings are valid in multiple encodings, so there’s a high chance that decoding with the wrong encoding will give a valid string, but not the one the writer intended.
Warning
It is possible to change the encodings at any time. However, only strings created after the change will use the new encoding. Having strings that are out of sync w.r.t encodings might lead to unexpected behaviour. It is recommended that the file is reloaded after changing the encodings to ensure that all strings use the same encoding.
See also
get_encodings
currently set encodings
Notes
Strings are decoded using Python’s bytes.decode(errors = ‘strict’).
References
Examples
Decoding of the same string under different encodings
>>> from dlisio import dlis, common >>> common.set_encodings([]) >>> with dlis.load('file.dlis') as (f, *_): ... print(getchannel(f).units) b'custom unit\xb0' >>> common.set_encodings(['latin1']) >>> with dlis.load('file.dlis') as (f, *_): ... print(getchannel(f).units) 'custom unit°' >>> common.set_encodings(['utf-16']) >>> with dlis.load('file.dlis') as (f, *_): ... print(getchannel(f).units) '畣瑳浯甠楮끴'
- dlisio.common.get_encodings()#
Get codepages to use for decoding strings
Get the currently set codepages used when decoding strings.
- Returns:
encodings
- Return type:
list
See also
Open#
- dlisio.common.open(path, offset=0)#
Open a file
Open a low-level file handle. This is not intended for end-users - rather, it’s an escape hatch for very broken files that dlisio cannot handle.
- Parameters:
path (str_like) –
offset (int) – Physical file offset at which handle must be opened
- Returns:
stream
- Return type:
dlisio.core.stream
See also
Error handling#
- class dlisio.common.ErrorHandler#
Defines rules about error handling
Many .dlis files happen to be not compliant with specification or simply broken. This class gives user some control over handling of such files.
When dlisio encounters a specification violation, it categories the issue based on the severity of the violation. Some issues are easy to ignore while other might force dlisio to give up on its current task. ErrorHandler supplies an interface for changing how dlisio reacts to different violation in the file.
Different categories are info, minor, major and critical:
Severity
Description
critical
Any issue that forces dlisio stop its current objective prematurely is categorised as critical.
By default a critical error raises a RuntimeError.
An example would be file indexing, which happens at load. Suppose the indexing fails midways through the file. There is no way for dlisio to reliably keep indexing the file. However, it is likely that the file is readable up until the point of failure. Changing the behaviour of critical from raising an Exception to logging would in this case mean that a partially indexed file is returned by load.
major
Result of a direct specification violation in the file. dlisio makes an assumption about what broken information [1] should have been and continues parsing the file on this assumption. If no other major or critical issues are reported, it’s likely that assumption was correct and that dlisio parsed the file correctly. However, no guarantees can be made.
By default a warning is logged.
[1] Note that “information” in this case refers to the data in the file that tells dlisio how the file should be parsed, not to the actual parsed data.
minor
Like Major issues, this is also a result of a direct specification violation. dlisio makes similar assumptions to keep parsing the file. Minor issues are generally less severe and, in contrast to major issues, are more likely to be handled correctly. However, still no guarantees can be made about the parsed data.
By default an info message is logged.
info
Issue doesn’t contradict specification, but situation is peculiar.
By default a debug message is logged.
ErrorHandler only applies to issues related to parsing information from the file. These are issues that otherwise would force dlisio to fail, such as direct violations of the RP66v1 specification. It does not apply to inconsistencies and issues in the parsed data. This means that cases where dlisio enforces behaviour of the parsed data, such as object-to-object references, are out of scope for the ErrorHandler.
Please also note that ErrorHandler doesn’t redefine issues categories, it only changes default behavior.
- info#
Action for merely information message
- minor#
Action for minor specification violation
- major#
Action for major specification violation
- critical#
Action for critical specification violation
Warning
Escaping errors is a good solution when user needs to read as much data as possible, for example, to have a general overview over the file. However user must be careful when using this mode during close inspection. If user decides to accept errors, they must be aware that some returned data will be spoiled. Most likely it will be data which is stored in the file near the failure.
Warning
Be careful not to ignore too much information when investigating files. If you want to debug a broken part of the file, you should look at all issues to get a full picture of the situation.
Examples
Define your own rules:
>>> from dlisio.common import ErrorHandler, Actions >>> def myhandler(msg): ... logging.getLogger('custom').info("error in dlisio") ... raise RuntimeError("Custom handler: " + msg) >>> errorhandler = ErrorHandler( ... info = Actions.SWALLOW, ... minor = Actions.LOG_WARNING, ... major = Actions.RAISE, ... critical = myhandler)
Parse a file:
>>> from dlisio import dlis >>> files = dlis.load(path) RuntimeError: "...." >>> handler = ErrorHandler(critical=Actions.LOG_ERROR) >>> files = dlis.load(path, error_handler=handler) [ERROR] "...." >>> for f in files: ... pass