dimanche 14 août 2022

What design pattern might help us implement a simple stream filter (or interpreter)?

This is kind of a "white-board" question instead a "keyboard" question as it is not language-specific.

I often have a stream of characters which must be translated from one "language" to another...

EXAMPLE ONE: UPS & DOWNS

+---------------+---------+--------+---------+--------+---------+---------+---------+
| INPUT STREAM  | (-1, 0) | (0, 1) | (0, 1)  | (0, 1) | (0, -1) | (0, 1)  | (0, 1)  |
+---------------+---------+--------+---------+--------+---------+---------+---------+
| OUTPUT STREAM | "left"  | "up"   | "right" | "up"   | "down"  | "right" | "right" |
+---------------+---------+--------+---------+--------+---------+---------+---------+

EXAMPLE TWO: Toggling Between Cases

You can force stream output to be pure lower-case or pure upper-case and use a "$" character to toggle between different modes:


+---------+----------+-------------+-----------+----------------+----------------+---------------+---------------+--------------+--------------+
| INPUT:  | "ApPleS" |     "$"     | "BaNaNaS" |      "$"       | "BLACKberries" |      "$"      | "BlUeBeRrIeS" |     "$"      | "CaNtAlOuPe" |
+---------+----------+-------------+-----------+----------------+----------------+---------------+---------------+--------------+--------------+
| OUTPUT: | "APPLES" | toggle mode | "bananas" | "BLACKBERRIES" | toggle mode    | "blueberries" | toggle mode   | toggle mode  | "CANTALOUPE" |
+---------+----------+-------------+-----------+----------------+----------------+---------------+---------------+--------------+--------------+

Suppose that you had a class named CharEater which eats elements of a stream and sends new (translated) elements to another stream.

Exactly what characters the CharEater outputs should vary depending on what "mode" the stream filter is in.

Maybe are only a handful of modes (4 to 7 modes) that the class can be in.

I usually write very ugly code which amounts to a bunch of:

  • nested loops
  • if-else clauses
  • boolean flags.

For some reason, my boolean flags usually have names beginning with the word "is" or "has", such as:

  • "is_valid"
  • "is_digit"
  • "is_red".
  • "is_blue".
  • "has_hit_rock_bottom".

An example of some ugly code is shown below:

import io

def eat_the_string(chars:str):
    """
    The implementation here is ugly

    However...
        we accept a string as input

        we convert letters to upper case or lower case

        A dollar sign ($) is used to toggle between upper-case mode
        and lower-case mode.

    EXAMPLE
        INPUT:
            "ApPleS $ BaNaNaS $ BLACKberries $ BlUeBeRrIeS $ CaNtAlOuPe"
        OUTPUT:
            "APPLES  bananas  BLACKBERRIES  blueberries  CANTALOUPE"
    """
    strm = io.StringIO() # a string stream
    is_lowercase = False
    for ch in chars:
        if ch == "$":
            is_lowercase = not is_lowercase
        else:
            if is_lowercase:
                print(
                    ch.lower(),
                    file=strm, sep = "", end = ""
                )
            else: # in uppercase mode
                print(
                    ch.upper(),
                    file=strm, sep="", end=""
                )
    return strm.getvalue()

I am seeking a more elegant solution.

A bunch of nested loops with a handful of boolean flags is not great for many reasons.

Two drawbacks are:

  • the mess of boolean flags and test conditions is difficult for human beings to read and understand
  • a lot of the test-conditions are redundant. I sometimes an if statement for is_first_iteration == True even though it is guaranteed that is_first_iteration will be False after the first loop iteration.

What is an example of a better design pattern (or approach) for implementing a stream filter whose output changes based on what mode the stream filter is in? We probably do not need a full-blown general-purpose parser and interpreter, but that is one option.

Below is one attempt to re-factor the mess of nested-loops with boolean flags. Feel free to steer me in a different direction.

# import tools for abstract base classes
import io
from abc import ABC, abstractmethod

class PacAbstract(ABC):
    # This is an abstract base class

    @abstractmethod
    def process_a_char(this, ch: str):
        """
            Input paramater `old_ch` is a char, such as the letter "A"
        """
        pass  # `pass` is no_op().... no_operation()... do nothing.


class PacModeLowercase(PacAbstract):
    """
        `PacModeLowercase` converts
        string characters into lower-case letters
    """

    def process_a_char(this: PacAbstract, ch: str):
        """
            Input parameter `old_ch` is a char, such as the letter "A"
        """
        return ch.lower()


class PacModeUppercase(PacAbstract):
    """
         This class represents a **mode**
         for an `PacMan` to be in.

        `PacModeUppercase` converts
        string characters into upper-case letters
    """

    def process_a_char(this: PacAbstract, ch: str):
        # Input paramter `old_ch` is a string, such as "hello world"
        return ch.upper()


class PacMan(PacAbstract):
    PacModeUppercase = PacModeUppercase
    PacModeLowercase = PacModeLowercase
    # Like the videogame character named "pacman,"
    # a `PacMan` eats things.
    #
    # a PacMan has a pointer (or reference) to a PacMode
    # `PacMode` represents the **mode** which the PacMan is in
    #
    # The mode determines what PacMan outputs.

    def __init__(this: PacAbstract, *, mode: str = "lower"):
        # __init__ is sort of like a class constructor.
        #
        # However, the true constructor is type.__call__()
        #
        # In other languages,
        # `__call__()` is sometimes known as ``
        #
        # All of the following are equvilant:
        #    instance = PacMan(1, 2, 3)
        #    instance = type.__call__(PacMan, 1, 2, 3)

        # Error checking. make sure mode is string-like
        mode = "".join(str(ch) for ch in mode)

        # Get modified copy with trimmed leading and traiPacAbstract white space
        mode = mode.strip()

        # Get modified copy with uppercase letters
        # converted to lowercase letters
        mode = mode.lower()

        if mode == "lower":
            this._pac_mode = type(this).PacModeLowercase()
        else:
            this._pac_mode = type(this).PacModeUppercase()

    def process_a_char(this: PacAbstract, ch: str):
        """
            Input paramter `old_ch` is a string, such as "hello world"
        """
        return this._pac_mode.process_a_char(ch)


pacman = PacMan()

strm = io.StringIO()
inputs = "ApPleS $ BaNaNaS $ BLACKberries $ BlUeBeRrIeS $ CaNtAlOuPe"
for old_ch in inputs:
    new_ch = pacman.process_a_char(old_ch)
    print(new_ch, file=strm, sep="", end="")

print(strm.getvalue())
# output is "apples $ bananas $ blackberries $ blueberries $ cantaloupe"

Anyway.... what is the name of a design pattern such that the data coming out of a stream filter varies depending on what "mode" the stream filter is in?

Below is code written to convert camel-case to lower-case letters and under-scores.

The code is written in the anti-pattern of nested loops, boolean flags, and buffers:

def camel_to_underscore(chars:str):
    """
    The implementation here is an anti-pattern 

    The code is hideous mess of nested-loops, boolean
    flags, test-conditions, and string buffers. 

    EXAMPLE A:
        INPUT:  "getUserInput()"
        OUTPUT: "get_user_input()
        
    EXAMPLE B:
        INPUTS: 
                RedDeliciousApples
                GrannySmithApples
                ConcordGrapes
                GreenGrapes
        Outputs:
                red_delicious_apples
                granny_smith_apples
                concord_grapes
                green_grapes
    """
    chars = "".join(str(ch) for ch in chars)
    it = iter(chars)
    buffer = []
    buffer.append(next(it).lower())
    for ch in it:
        low_ch = ch.lower()
        if low_ch != ch:
            buffer.append("_")
        buffer.append(low_ch)
    return "".join(buffer)

If you do not have the name of a design pattern, do you have some pseudo-code different from the nest-loops with boolean flags? Maybe some sort of class object whose method outputs change depending on what "mode" the class is in?

EXAMPLE THREE: Removing Consecutive Duplicates

An example of a more complicated application might be to delete repeated sub-strings:

  • we read a string from left to right
  • if the same sub-string repeats itself ("Echo Echo Echo Echo") then we delete the redundant copies.
  • we allow a single character to repeat up to 2 times, but no other string may repeat.
INPUT OUTPUT
"Misssssssssissssssssippppppppppi" "Mississippi"
"apple apple apple apple" "apple"
"orange orange orange orange orange" "orange"
"\n\n\n\n\n\n\n\n" "\n"

Aucun commentaire:

Enregistrer un commentaire