mardi 11 mai 2021

read and do pattern matching from files bigger than memory

i have two files which has some urls.

file1.txt

*.aa.com
*.bb.com/*
....
...

file2.txt

www.aaa.com
aa.bb.cc/ddx/eee
..

so based on this both in file 2 are present in file 1 with constraint “*” is a wildcard that matches differently before and after the delimiter

a) On the hostname, before the first “/”, it matches any characters that is not “/”
b) On the pathname, after the first “/”, it matches any characters

how do i do this optimally in go by reading to buffer. The files are so large that we can't store it in memory

present solution is:
read through the file2 first and for each line in file2 compare every element in file1. and print if there is a possible match. how do we optimally do it? Could we decrease from O(N*K) where N is total elements in file1.txt and K is total elements in file2.txt

Aucun commentaire:

Enregistrer un commentaire