mardi 23 janvier 2024

onion architecture in spring boot

I have a new microservice project. I want to decide about it's architect. I think onion architecture is not common in spring boot projects, but my lead prefer it. his stack is .net . could I know your experience about that?

What do you think about best microservice pattern for spring boot projects?

mardi 16 janvier 2024

Should I, or can I use an: 'if __name__ == "__main__:" style construct but within an imported package or module?

The simple python file hello.py is executed in the top level environment as main:

#hello.py
def hello(input):
    print(f"hello, {input}")


if __name__ == '__main__':
    hello(input = 'world!')

I know from the docs and from previous stackoverflow Q&A, that this pattern is somewhat of a formality in standalone scripts and simple packages, but it is good practice and can be important to prevent your script from being executed if it were to be imported as a module without the if __name__ == '__main__': condition.

I now have a module foobar.py either installed or temporarily added to sys.path, and I want to import it in hello.py and then for example, execute a function or instantiate some classes it defines during the import, (or perhaps later in the main script):

#foobar.py
def foobar():
    print(f"foo bar baz")

Is it anywhere written in the docs, or implied that I can or should include the following:

if __name__ == 'foobar':
    foobar()

in a similar manner to if __name__ == '__main__': but within files for execution as an import?

Doing this to guarantee execution of the foobar() function during the import, as opposed to simply calling foobar() either at the end of the foobar.py file or after its import, from within hello.py, which would to my understanding execute the function in all cases I can see!

Similarly, should I also include a blank:

if __name__ == '__main__':
    pass

in modules and scripts which I never intend to be run as main? are there any obvious gotchas to avoid when doing this?

samedi 13 janvier 2024

Writing an async version of a method in a DRY fashion?

The documentation of System.IO.Stream specifies that, when deriving from it, at most, only the Read method needs to be implemented.

This automatically enables the use of ReadAsync method in the derived type, i.e. you get asynchronous reading for free.

Looking at how it is implemented, it is pretty involved. There is a custom class (ReadWriteTask) deriving from Task<int> which is responsible for the asynchronous reading.

A naive approach for one to avoid DRY could be the following:

public abstract class Reader
{
    public abstract int Length { get; }

    public bool TryRead(Stream stream, byte[] buffer)
    {
        if (stream.Length - stream.Position < Length)
        {
            return false;
        }

        stream.ReadExactly(buffer);

        // do some special, lengthy stuff

        return true;
    }

    public async Task<bool> TryReadAsync(Stream stream, byte[] buffer)
    {
        return await Task.Run(() => TryRead(stream, buffer)).ConfigureAwait(false);
    }
}

However, after looking at their implementation, I have the feeling that the above approach may be wrong/inefficient.

Can you suggest a pattern to tackle this problem?

jeudi 11 janvier 2024

Why do we use factory class to instantiate interface

I'm currently reading book pro spring 6. I am a bit confused on one example from book

Hello World Application

Here is the main class:

public class HelloWorldDecoupledWithFactory {

    public static void main(String... args) {
        MessageRenderer mr = MessageSupportFactory.getInstance().getMessageRenderer()
                .orElseThrow(() -> new IllegalArgumentException("Service of type 'MessageRenderer' was not found!"));
        MessageProvider mp = MessageSupportFactory.getInstance().getMessageProvider()
                .orElseThrow(() -> new IllegalArgumentException("Service of type 'MessageProvider' was not found!"));
        mr.setMessageProvider(mp);
        mr.render();
    }
}

The book mentions, that there is a small problem "changing the implementation of either the MessageRenderer or MessageProvider interface means a change to the code. To get around this we need to delegate the responsibility of retrieving the two implementation types and instantiating them to somebody else. The most manual one is to create a simple factory class, that reads the implementation class names from a properties file and instantiates them on behalf of the application"

I'm a bit confused on changing the implementation of either the MessageRenderer or MessageProvider interface means a change to the code. For example, if we add an additional method to interface, the classes implement it will change. So what is purpose using Factory?

public class MessageSupportFactory {

    private static MessageSupportFactory instance;
    private Properties props;
    private MessageRenderer renderer;
    private MessageProvider provider;

    private MessageSupportFactory() {
        props = new Properties();
        try {
            props.load(this.getClass().getResourceAsStream("/msf.properties"));
            String rendererClass = props.getProperty("renderer.class");
            String providerClass = props.getProperty("provider.class");
            renderer = (MessageRenderer) Class.forName(rendererClass).getDeclaredConstructor().newInstance();
            provider = (MessageProvider) Class.forName(providerClass).getDeclaredConstructor().newInstance();
        } catch (Exception ex) {
            ex.printStackTrace();
        }
    }

    static {
        instance = new MessageSupportFactory();
    }

    public static MessageSupportFactory getInstance() {
        return instance;
    }

    public Optional<MessageRenderer> getMessageRenderer() {
        return renderer != null?  Optional.of(renderer) : Optional.empty();
    }

    public Optional<MessageProvider> getMessageProvider() {
        return provider!= null?  Optional.of(provider) : Optional.empty();
    }
}

mercredi 10 janvier 2024

Linked list vs GC in CLR

Due to my not exactly very deep understanding of how GC works (it's 40k lines of C source in CLR's runtime, I just can't can) and the necessity of using both doubly-linked and very big data storage, let's consider if it is exactly a viable solution to use a System.Collections.Generic.LinkedList<T> as a storage for that.

For a better understanding, there are some hints about the data to be stored (you can aswell skip):

  1. The entire collection should be locked in memory all the time. It means no swapping, being paged out of memory or being 'manually' written to a file, because the access to first and last elements should be very fast.
  2. The data is continuously both added to the "head" and removed from the "tail". The data rate can be pretty high, meaning a lot of new objects are created and added to the front and a lot of data is detached from the back of the list, making them eligible for GC.
  3. The ammount of data items added is always equal to ammount of items removed.
  4. The ammount of data to be stored is actually pretty high too. The roughly estimated memory requirement is being from 20 to 200 Mb. The ammount of required memory is fixed and known at a 'collection' instantiation time.
  5. Every data item is a strongly-typed recursively-unmanaged struct. It means that it can be easily serialized to a byte array forth and back.
  6. It should always be trivial to access at max two neighbour elements.
  7. The performance impact of using such a big data storage should be minimal.

The main concern that bothers me is as follows: won't it rely too hard on a GC, making GC use a lot of CPU time (GC pressure) just to traverse an object graph? Or, maybe a sweep phase will become a nightmare? My poor understanding and basic intuition tell me that (huge quantity of objects) + (every one of them holding a reference to the next one) + (a lot of newly-created 2nd-generation objects that are detached from tail) = (a lot of GC time).

Should I rather consider using an unmanaged heap as a solution? Because of the size being fixed, the approximated procedure as I see it should be as follows:

  1. At an application initialization, request a huge page from an OS. The best would be to have a single 256 Mb page (not sure if it is possible to have 256 Mb page on x86_64 or an ARM). Fall back to a regular malloc (Sytem.Runtime.InteropServices.NativeMemory.Alloc(nuint) method in CLR) or VirtualAlloc for Windows via kernel call, or some Linux equivalent.
  2. Allocate some kind of a free list in this region.
  3. Keep track of the ammount of data added. As soon as the region is full, simply overwrite 'tail' elements.
  4. Drop (free) the region if it will be no longer needed for an app operation.

Personally I would prefer the later, because it does not require deallocations of every single detached object, no sweeping is required either. Using huge pages where possible will make pointer dereferencing faster (I believe in CLR it is also the case). The other benefit is an implementation detail, but useful to know in case you didn't - huge pages are locked by an operating system in memory, so they will never be swapped out. On the other hand, it will require a little bit of self-made PAL (Platform Abstraction Layer) and clearly writing some 'unsafe' code - either by using pointers and C#'s unsafe or System.CompilerServices.Unsafe static methods along with managed refs and ins. The other way would be perhaps using some kind of a ring-shaped array, but let's save that for another question on SO.

I know that the best solution is to make both approaches undergo a benchmark, but according to your experience, how would you implement this?

Retry Mechanism

I am seeking guidance for an issue related to an image downloading system. Our system interacts with various external suppliers via Kafka messages, and each message contains a list of image URLs from different domains to be downloaded. The challenge is that we have to respect the rate limits of these domains. If we make too many requests, we receive a 429 "Too Many Requests" response. We tried to mitigate this by implementing an exponential backoff strategy, but some domains still rate-limit us even after a minute of downtime. Additionally, we must maintain sequential processing for each supplier's messages to avoid database conflicts, meaning that the messages from a supplier must be processed in their original order of arrival. We're left wondering how to navigate these rate limits effectively, while ensuring sequential request handling. Any insights would be greatly appreciated!

I am thinking of solutions like Circuit Breaker pattern. Kindly suggest some ideas.