We have an C++ importer, which imports a big amount of xml-data into a database. Usually, the number of records in the xml are between 500 000 and 2 000 000.
The importer was refactored from the following code into the new code:
Legacy Code
class DataHandler
{
public:
handleStartTag()
{
_value.clear();
}
handleAttribute(const std::string& attribute)
{
_attribute = attribute;
}
handleValue(const std::string& value)
{
_value += value;
}
handleEndTag()
{
_record.set(_attribute, _value);
RecordProcessor rp(_record);
// rp.setSomeProperties
rp.run();
}
private:
std::string _attribute;
std::string _value;
Record _record;
};
New Code
class DataHandler
{
public:
handleStartTag()
{
_value.clear();
}
handleAttribute(const std::string& attribute)
{
_attribute = attribute;
}
handleValue(const std::string& value)
{
_value += value;
}
handleEndTag()
{
_record.set(_attribute, _value);
_processor.setSomeProperties();
_processor.run(record);
_processor.clear();
}
private:
std::string _attribute;
std::string _value;
Record _record;
RecordProcessor _processor;
};
The main difference is, that we don't create a new instance of RecordProcessor
in every call of the handleEndTag()
. This was implemented, because (they say) for 2 million records it needs a lot of time to create/destroy the RecordProcessor
instance.
My personal opinion is, that the legacy code is the better pattern to implement this functionality. The RecordProcessor
will be created directly with the corresponding record.
Questions
Is this really a performance boost?
What are the advantages/disadvantages of both implementations?
What pattern should I use for similar projects in future?
Aucun commentaire:
Enregistrer un commentaire