lundi 22 mai 2017

Best data-structure/db/binary-format for holding hierarchic elements

I have the following hierarchic entities: Country, City, Street. Every country has cities, every city has streets.

I wrote pseudo-code for what I want to run:

handle_country_before(country)
for city in country:
    handle_city_before(city)
    for street in city:
        handle_street_before(street)
        handle_street_after(street)
    handle_city_after(city)
handle_country_after(country)

I tried the following approaches:

The document(noSql) approach:

I saved all my data in a flat manner:

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y1" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y2" info corresponding with the street
    },
    street : {
        # street info...
    }
}

{
    country : { 
        # Country "x1" info corresponding with the street
    },
    city : {
        # City "y2" info corresponding with the street
    },
    street : {
        # street info...
    }
}

Using this method, I had to use the following pseudo-code:

    last_country = 0
    last_city = 0
    last_street = 0
    for element in elements:
        if element.country.id != last_country_id:
            if (0 != last_country) :
                handle_country_after(last_country)
            handle_country_before(element.country)
        if element.city.id != city:
            if (0 != last_city) :
                handle_country_after(last_city)
            handle_country_before(element.city)     
        if element.street.id != street:
            if (0 != last_street) :
                handle_country_after(last_street)
            handle_country_before(element.street)           

The disadvantage: I feel like this approach is a little bit over-kill and the use of the flat structure is not fit for my case, furthermore it was very very slow and space inefficient.

The SQL approach:

I saved each entity in a table: Country, City, Street and iterated it with the following code:

    country_cursor = query('select * from countries')
    for country in country_cursor:
        handle_country_before(country)
        city_cursor = query('select * from cities where parent_country_ref=%s' % (country.id))
        for city in city_cursor:
            street_cursor = query('select * from streets where parent_city_ref=%s' % (city.id))
            ...
            ...
        ...
        handle_country_after(country)

at the beginning, it looked like the best approach. But as I added more metadata tables and had to use JOIN statements it became increasingly slower, then I tried using a materialized view to speed up things a little but got the got the same result as using documents.

The custom format approach:

I tried saving the information in my own binary-serialization format:

<number of countries>[1st-country-data]<number of citieis>[1nd-city-data]<number of streets>[1st-street-data][2nd-street-data][3rd-street...]...

the disadvantage: this couldn't scale, I couldn't update information, I couldn't fetch a particular city/street, every search was O(n).

What I am looking for is a serialization-format/db wich will be:

  1. Able to add/update fields for existing elements

  2. Efficient in speed, space and memory

  3. C compliant (no CPP)

Thanks in advance.

Aucun commentaire:

Enregistrer un commentaire