Frequently Asked Questions (FAQ)¶
This page collects the most common questions about using cfinterface,
organized by topic. For complete details on each class and method,
see the API reference in the Module Reference section and the
architecture overview.
When should I use Register, Block, or Section?¶
The choice depends on the structure of the file you need to model:
Register – use when each relevant line in the file is identified by a fixed prefix (e.g.,
"NR","DA"). TheRegistercompares the beginning of each line against theIDENTIFIERattribute to decide which record type to process. UseRegisterFileas the file class.Block – use when the relevant data lies between a start line and an end line identifiable by regular expressions (e.g.,
BEGIN_PATTERN = r"^SECTION"andEND_PATTERN = r"^END"). TheBlockdelegates to the developer the implementation ofread()andwrite(). UseBlockFileas the file class.Section – use when the file is divided into ordered, sequential blocks without begin and end delimiters. Each section is processed in the order in which it appears in the file class’s
SECTIONSlist. UseSectionFileas the file class.
In summary:
Model |
When to use |
|---|---|
|
Lines with a fixed prefix identifier |
|
Blocks with begin and end delimiters |
|
Ordered sections without delimiters |
How do I handle binary files?¶
Set STORAGE = StorageType.BINARY in your file class. The framework
will open the file in binary mode and delegate the reading and writing of
each field to the _binary_read and _binary_write methods of the
declared fields.
from cfinterface.files.registerfile import RegisterFile
from cfinterface.storage import StorageType
class MyBinaryFile(RegisterFile):
REGISTERS = [MyRecord]
STORAGE = StorageType.BINARY
ENCODING = "utf-8"
file = MyBinaryFile.read("/path/to/file.bin")
The native fields (FloatField and
IntegerField) already implement
_binary_read and _binary_write using numpy internally. For custom
fields, you must implement these methods when subclassing
Field.
Note
Never use the literal string "BINARY" for the STORAGE attribute. This
usage has been deprecated since version 1.9.0. Always use
StorageType.BINARY.
How do I use the pandas integration?¶
Pandas support is an optional dependency. Install it with:
pip install cfinterface[pandas]
Then use the _as_df() method available on
RegisterFile to obtain a
pandas.DataFrame with all records of a given type:
from cfinterface.files.registerfile import RegisterFile
class MyFile(RegisterFile):
REGISTERS = [RecordA, RecordB]
file = MyFile.read("/path/to/file.txt")
# Returns a DataFrame with all instances of RecordA
df = file._as_df(RecordA)
print(df.head())
The method performs a lazy import (only when called), so the rest of the
code works normally even without pandas installed. The import only fails at
the moment _as_df() is invoked without the package present.
For tabular data within sections, the
TabularParser provides the static
method to_dataframe():
from cfinterface.components.tabular import TabularParser
data = parser.parse_lines(lines)
df = TabularParser.to_dataframe(data)
How do I define a custom field?¶
Subclass Field and implement the
four abstract methods. The methods _textual_read and _textual_write
handle text mode; _binary_read and _binary_write handle binary mode.
from cfinterface.components.field import Field
class BooleanField(Field):
"""Field that represents a boolean as 'S'/'N' in text
or 0x01/0x00 in binary."""
def _textual_read(self, line: str) -> bool:
token = line[self._starting_position:self._ending_position].strip()
return token == "S"
def _textual_write(self) -> str:
return ("S" if self._value else "N").ljust(self._size)
def _binary_read(self, line: bytes) -> bool:
return line[self._starting_position:self._ending_position] == b"\x01"
def _binary_write(self) -> bytes:
return b"\x01" if self._value else b"\x00"
The field can be used in any Line
in the same way as native fields:
from cfinterface.components.line import Line
line = Line([BooleanField(size=1, starting_position=0)])
value = line.read("S") # True
Note that _textual_write and _binary_write receive no arguments
besides self: the value to be written is read from self._value.
How do I resolve common parsing errors?¶
A field value returns ``None``
The field could not interpret the content of the line. Check that
starting_position and size are correct for the actual file. The method
read() catches ValueError and
returns None when the conversion fails.
Encoding error (``UnicodeDecodeError``)
The file uses an encoding different from those listed in ENCODING. Adjust
the file class’s ENCODING attribute to include the correct encoding:
class MyFile(RegisterFile):
REGISTERS = [...]
ENCODING = ["latin-1", "utf-8"] # tries latin-1 first
The framework tries each encoding from the list in order and uses the first one that does not raise an error.
Record not found in the file
Check that IDENTIFIER and IDENTIFIER_DIGITS exactly match the prefix in
the file. IDENTIFIER_DIGITS must equal the number of characters (or bytes,
in binary mode) that form the identifier. An incorrect value causes the method
matches() to never return
True.
from cfinterface.components.register import Register
from cfinterface.components.line import Line
from cfinterface.components.literalfield import LiteralField
class NameRecord(Register):
IDENTIFIER = "NM" # exactly 2 characters
IDENTIFIER_DIGITS = 2 # must match len(IDENTIFIER)
LINE = Line([LiteralField(size=20, starting_position=2)])
How do I use TabularParser?¶
TabularParser converts lists of
text lines into a dictionary of lists indexed by column name, and also
provides the inverse operation.
The column schema is declared with
ColumnDef. Each ColumnDef requires
its own field instance – fields are mutated in-place during reading and cannot
be shared between columns.
from cfinterface.components.tabular import TabularParser, ColumnDef
from cfinterface.components.literalfield import LiteralField
from cfinterface.components.floatfield import FloatField
columns = [
ColumnDef(name="product", field=LiteralField(size=20, starting_position=0)),
ColumnDef(name="price", field=FloatField(size=10, starting_position=20, decimal_digits=2)),
]
parser = TabularParser(columns)
lines = [
"Product A 12.50 ",
"Product B 7.99 ",
]
data = parser.parse_lines(lines)
# {"product": ["Product A", "Product B"], "price": [12.5, 7.99]}
For integration with SectionFile, use
TabularSection and declare only the
class attributes COLUMNS, HEADER_LINES, END_PATTERN, and
DELIMITER. See the complete example in the gallery:
Tabular Parsing.
How does file versioning work?¶
Versioning allows the same file class to read content produced by different versions of the schema, without needing separate classes.
Declare the class attribute VERSIONS as a dictionary mapping version keys
(strings compared lexicographically) to lists of component types:
from cfinterface.files.registerfile import RegisterFile
from cfinterface.storage import StorageType
class VersionedFile(RegisterFile):
REGISTERS = [RecordV2A, RecordV2B] # default schema (most recent)
VERSIONS = {
"1.0": [RecordV1A],
"2.0": [RecordV2A, RecordV2B],
}
STORAGE = StorageType.TEXT
# Selects the schema of the most recent version <= "1.5", i.e., "1.0"
file = VersionedFile.read("/path/to/file.txt", version="1.5")
# Validates whether the read content matches the expected schema
result = file.validate(version="1.0")
print(result.matched) # True if all expected types were found
print(result.missing_types) # expected types that did not appear in the file
The function resolve_version() performs the
lexicographic resolution: given version="1.5" and keys ["1.0", "2.0"],
it returns the components for "1.0" (the largest key less than or equal to "1.5").
For complete versioning examples, see the gallery example: Versioned BlockFile.