I'm creating a solution for a multi-webserver farm that basically receives and stores files for remote users. The files arrive at our load balancer in small, HTTPS encrypted chunks that can be anywhere from 1k to >1024k in size, and typically represent
a fraction of the overall file being received.
After all the chunks have been received, but before storing the file, we create a sha1 hash of the file just sent in by the user. If it already matches the hash of something in storage, we merely store a pointer to the originally stored file, instead
of storing another copy.
(For reasons outside the scope of this discussion, the client is incapable of computing the hash on its own, and must sent the file to the server).
The "chunk-storage" infrastructure is sufficiently distributed that a given stream of chunks can be processed on any number of web servers. The same is true of the ASP Session variable, which is stored on a SQL cluster in the farm, so any web server
can pick up where a prior server's session left off.
The SHA1 computation, however, currently runs after receiving the last chunk of data, which creates an unnecessary I/O spike and delay in deciding whether or not to commit the file to storage or use a pointer: The whole file is retrieved from stor
View Complete Post