Performance Tips ================ ``cfinterface`` v1.9.0 includes several internal optimizations that improve performance in file reading and writing scenarios. Beyond these automatic improvements, there are usage patterns that allow developers to extract even more performance when modeling files with the framework. This page describes the internal optimizations and guides how to take advantage of them through conscious design choices. Regex Cache in Adapters ------------------------ In previous versions, each read call that used a regular expression pattern recompiled the pattern from the original string. Starting from v1.9.0, the adapter module maintains a global dictionary ``_pattern_cache`` that stores the compiled objects for each pattern after its first use. The impact is transparent to the user: the same pattern passed in ``BEGIN_PATTERN`` or ``END_PATTERN`` of a :class:`~cfinterface.components.block.Block` is compiled only once, no matter how many times the file is read during program execution. No additional action is required; the cache is automatic. The only consideration is to keep patterns as stable literal strings in the class definition, avoiding the construction of dynamic patterns at runtime, which would generate distinct cache entries and negate the benefit. .. code-block:: python from cfinterface.components.block import Block class MyBlock(Block): # Pattern compiled once and reused across all reads BEGIN_PATTERN = r"^BEGIN" END_PATTERN = r"^END" FloatField Optimization ------------------------- The ``_textual_write()`` method of :class:`~cfinterface.components.floatfield.FloatField` was rewritten to perform at most three formatting attempts, regardless of the value of ``decimal_digits``. The previous implementation iterated through a loop of size O(decimal_digits) to find the number of decimal places that fits in the field. To get the most out of this optimization, declare ``size`` and ``decimal_digits`` with the minimum values needed to represent the values in your domain. Oversized fields still work correctly, but fields sized to the actual value eliminate unnecessary formatting attempts. .. code-block:: python from cfinterface.components.floatfield import FloatField from cfinterface.components.line import Line # Prefer size adjusted to the domain of the value price_field = FloatField(size=10, starting_position=0, decimal_digits=2) # Avoid unnecessarily large fields # price_field = FloatField(size=30, starting_position=0, decimal_digits=15) line = Line([price_field]) Array-Based Containers ----------------------- The container classes :class:`~cfinterface.data.registerdata.RegisterData`, ``BlockData``, and ``SectionData`` have been migrated from linked-list structures to Python lists (``list``) with an auxiliary index by type. The main practical consequences are: - ``len()`` is now O(1) instead of O(n) as in the previous implementation - Iteration over all elements remains O(n), but with better memory locality (contiguous elements in memory) - The ``previous`` and ``next`` properties of records, blocks, and sections are now computed from the position in the container, with no additional storage cost This gain is automatic for any code that uses the existing file classes without modification. Batch Reading with read_many() -------------------------------- When multiple files of the same type need to be read, the loop pattern with individual instantiation can be replaced by the class method :meth:`~cfinterface.files.registerfile.RegisterFile.read_many`, available on :class:`~cfinterface.files.registerfile.RegisterFile`, :class:`~cfinterface.files.blockfile.BlockFile`, and :class:`~cfinterface.files.sectionfile.SectionFile`. .. code-block:: python # Before: individual reading in a loop from my_module import MyFile files = [] for path in paths: f = MyFile.read(path) files.append(f) .. code-block:: python # After: batch reading with read_many() from my_module import MyFile # Returns a dict[str, MyFile] keyed by path files = MyFile.read_many(paths) # Access by path file = files["/path/to/file.txt"] The ``read_many()`` method accepts the optional ``version`` parameter to select the versioning schema, in the same way as ``read()``: .. code-block:: python files = MyFile.read_many(paths, version="1.0") Column Selection in TabularParser ------------------------------------ The :class:`~cfinterface.components.tabular.TabularParser` parses positional (or delimited) text lines and converts each declared column into a list of values. In large tabular files, declaring only the necessary columns reduces the type conversion work for each line read. Use :class:`~cfinterface.components.tabular.ColumnDef` to list only the columns of interest, omitting the rest: .. code-block:: python from cfinterface.components.tabular import TabularParser, ColumnDef from cfinterface.components.literalfield import LiteralField from cfinterface.components.floatfield import FloatField from cfinterface.components.integerfield import IntegerField # File has 5 columns; only 2 are needed required_columns = [ ColumnDef(name="code", field=LiteralField(size=8, starting_position=0)), ColumnDef(name="value", field=FloatField(size=12, starting_position=20, decimal_digits=4)), # Columns at positions 8-19 and 32+ are simply ignored ] parser = TabularParser(required_columns) data = parser.parse_lines(lines) # {"code": [...], "value": [...]} For tabular sections integrated with the framework, declare only the necessary columns in the ``COLUMNS`` class attribute of your :class:`~cfinterface.components.tabular.TabularSection` subclass: .. code-block:: python from cfinterface.components.tabular import TabularSection, ColumnDef from cfinterface.components.literalfield import LiteralField from cfinterface.components.floatfield import FloatField class DataSection(TabularSection): COLUMNS = [ ColumnDef(name="id", field=LiteralField(size=8, starting_position=0)), ColumnDef(name="result", field=FloatField(size=12, starting_position=20, decimal_digits=3)), ] HEADER_LINES = 1 END_PATTERN = r"^---" General Tips ------------- The following tips complement the internal optimizations described above and apply to any code that uses ``cfinterface``. **Reuse file class instances for multiple reads** The ``read()`` method is a class method that returns a new instance on each call. When the same file needs to be read again (for example, after a write), prefer saving and reusing the existing instance or use ``read_many()`` for a known set of paths at once. **Use the StorageType enum instead of literal strings** The ``STORAGE`` attribute accepts both strings (``"TEXT"``, ``"BINARY"``) and the enum :class:`~cfinterface.storage.StorageType`. The use of strings has been deprecated since v1.9.0 and emits a warning at runtime. Always prefer the enum: .. code-block:: python from cfinterface.files.registerfile import RegisterFile from cfinterface.storage import StorageType class MyFile(RegisterFile): REGISTERS = [MyRecord] STORAGE = StorageType.TEXT # correct # STORAGE = "TEXT" # deprecated; avoid **Declare ENCODING as a single string when the encoding is known** The ``ENCODING`` attribute accepts a single string or a list of strings. When passed as a list, the framework tries each encoding in order until a read succeeds. If the file encoding is known and fixed, declare ``ENCODING`` as a string directly to eliminate the unnecessary attempts: .. code-block:: python class MyFile(RegisterFile): REGISTERS = [MyRecord] ENCODING = "latin-1" # direct read, no extra attempts # ENCODING = ["latin-1", "utf-8"] # only needed when the # # encoding may vary