tl;dr: Is there a pattern for dynamically saving lots of small json files to a file system?
I am hitting a REST API once a minute, looking for results from a form submission. Each form result is turned into a JSON
document and saved to a database.
Often there's nothing coming back, and sometimes there are a few 10s of documents per hit. The documents end up being very small, like 10-50K
each.
I've been asked to also save each document to a separate gzip on the filesystem (of a different server) as well. The files do not need to be organized; accessing via os.walk
is fine. (I'm using Python
.)
Plan is to create a main directory manually, and then, add subdirectories as necessary by checking to see if the last one created has more than 999 files in it:
# will manually create '~/data/poll/0000000000' just before running
# will run from ''~/data/poll'
def create_poll_dir():
starting_path = os.path.join(os.path.expanduser('~'), 'data', 'poll')
all_subdirs = [d for d in os.listdir('.') if os.path.isdir(d)]
latest_subdir = max(all_subdirs, key=os.path.getmtime)
path = os.path.join(starting_path, latest_subdir)
if len(os.listdir(path)) > 999:
new_dir = str(int(latest_subdir) + 1).zfill(10)
path = ckmkdirs(os.path.join(starting_path, new_dir))
return path
- Since this is simply for backup, with ease and speed of access not an issue, and the documents are so small, is appending everything to a log-like file, using jsonlines, a better approach (i.e. less filesystem strain, just as much data integrity)?
- If I stay with this approach (which would eventually throw an exception, as is, I know), should I be using a less-flat organization scheme? And does 1000 seem like a happy medium for number-of-small-files in a directory?
Aucun commentaire:
Enregistrer un commentaire