I have the following hierarchic entities: Country, City, Street. Every country has cities, every city has streets.
I wrote pseudo-code for what I want to run:
handle_country_before(country)
for city in country:
handle_city_before(city)
for street in city:
handle_street_before(street)
handle_street_after(street)
handle_city_after(city)
handle_country_after(country)
I tried the following approaches:
The document
(noSql) approach:
I saved all my data in a flat manner:
{
country : {
# Country "x1" info corresponding with the street
},
city : {
# City "y1" info corresponding with the street
},
street : {
# street info...
}
}
{
country : {
# Country "x1" info corresponding with the street
},
city : {
# City "y1" info corresponding with the street
},
street : {
# street info...
}
}
{
country : {
# Country "x1" info corresponding with the street
},
city : {
# City "y1" info corresponding with the street
},
street : {
# street info...
}
}
{
country : {
# Country "x1" info corresponding with the street
},
city : {
# City "y2" info corresponding with the street
},
street : {
# street info...
}
}
{
country : {
# Country "x1" info corresponding with the street
},
city : {
# City "y2" info corresponding with the street
},
street : {
# street info...
}
}
Using this method, I had to use the following pseudo-code:
last_country = 0
last_city = 0
last_street = 0
for element in elements:
if element.country.id != last_country_id:
if (0 != last_country) :
handle_country_after(last_country)
handle_country_before(element.country)
if element.city.id != city:
if (0 != last_city) :
handle_country_after(last_city)
handle_country_before(element.city)
if element.street.id != street:
if (0 != last_street) :
handle_country_after(last_street)
handle_country_before(element.street)
The disadvantage: I feel like this approach is a little bit over-kill and the use of the flat structure is not fit for my case, furthermore it was very very slow and space inefficient.
The SQL approach:
I saved each entity in a table: Country, City, Street and iterated it with the following code:
country_cursor = query('select * from countries')
for country in country_cursor:
handle_country_before(country)
city_cursor = query('select * from cities where parent_country_ref=%s' % (country.id))
for city in city_cursor:
street_cursor = query('select * from streets where parent_city_ref=%s' % (city.id))
...
...
...
handle_country_after(country)
at the beginning, it looked like the best approach. But as I added more metadata tables and had to use JOIN statements it became increasingly slower, then I tried using a materialized view to speed up things a little but got the got the same result as using documents.
The custom format approach:
I tried saving the information in my own binary-serialization format:
<number of countries>[1st-country-data]<number of citieis>[1nd-city-data]<number of streets>[1st-street-data][2nd-street-data][3rd-street...]...
the disadvantage: this couldn't scale, I couldn't update information, I couldn't fetch a particular city/street, every search was O(n).
What I am looking for is a serialization-format/db wich will be:
-
Able to add/update fields for existing elements
-
Efficient in speed, space and memory
-
C compliant (no CPP)
Thanks in advance.
Aucun commentaire:
Enregistrer un commentaire