Building a repository manifest

Building a manifest for the Brain Image Library

I am currently building a file manifest for the Brain Image Library. This has been a humbling experience for many reasons.

For starters I thought it would take a few days to compute checksums over all public files. In reality it took close to a month since some datasets have over 500K files and some files are close to 14Tb in size. I am down to the last two datasets of this batch, so I am somewhat happy.

→ squeue -u icaoberg
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
82360     batch  test.sh icaoberg  R 5-08:16:12      1 c02
82359     batch  test.sh icaoberg  R 5-08:16:18      1 c02

The next question is how and where to store this manifest. What kind of db tech should I use to retrieve the records efficiently?

I will need to figure it out.