vendredi 27 octobre 2017

File Deduplication

A deduplication is a specialised form of compression for when the granularity of redundancy is large. one of the simplest implementation of duplication involves the three steps:

1: pick the size of the chunk, which is the granularity at which you want to duplicate a file.
2. inspect every non-overlapping chunk in the file (e. g. 0th KB,1thKB,2thKB….99thKB)and identify the unique ones.
3. for each unique chunk, take note of where they are found in the file (e.g. 0,1,2,3…99).

Task Overview

In this challenge, you’ll implement the following two function. dedup() reduplicates a large input file to a smaller output file. The contents and structure of outfield may vary depending on your implementation .however the size of this file must be smaller than the size of the input file! redup() uses the output file from the dedup() function to reconstruct(or reduplicate) the original file. in addition to the code, we also expect you to write a short design document.It should describe your solution at a high level how your solution works how your output file is structured. Difficulties you have to overcome, kinks you couldn’t iron out, etc.
please include a design document block at the very top of your solution

Requirements & Specifications
use a chunk size of 1KB(1024B) Input file’size is always a multiple of the chunk size.
Assume that each chance contains random bytes as the data. Hence, a file will be binary.
the input file may be too large to fit into memory.
Output file must be portable across programming languages and operating system.

Function prototype

dedup(input_file_path,deduped_file_path );

redup(deduped_file_path ,output_file_path);

please help me on this.This is very important for me.

Aucun commentaire:

Enregistrer un commentaire