How to store large files in a database with GridFS in MongoDB?
Sometimes we need to store files in our databases, but MongoDB doesn’t allow us to store files larger than 16MB in a normal document. If we want to store files of more than 16MB MongoDB has a functionality specifically for storing large files with the name GridFS.
GridFS divides the file into chunks(binary file chunks) and then stores them in the database.
GridFS stores files in a bucket, which is a group of Mongo DB collections containing file chunks and file information. GridFS has these collections:
Files – (Files store file metadata)
Chunks – (chunks store binary file chunks)
When we create a GridFS bucket to store files, it automatically creates the collection files and chunks within the bucket for the GridFS. The GridFS bucket has a default name fs.
GridFS places the collections in the same bucket by prefixing each with the bucket name. By default, It uses two collections with a default bucket named fs:
- fs.files
- fs.chunks
When will you use GridFS?
- If our file system has limits on the number of files. We can use GridFS to store as many files as needed.
- When we need to access a part of a large file without loading the whole file. We can use GridFS to recall parts of files without reading the entire file which is in memory.
- When we want to store and sync files and file metadata across the distributed systems.
Chunks Collection:-
Each document in the chunks collection represents a distinct chunk of a file as shown in GridFS. Documents in this collection can have the following structure:
{ "_id" : <ObjectId>, "files_id" : <ObjectId>, "n" : <num>, "data" : <binary> }
A document from the chunks collection can have the following fields:
chunks._id
The unique ObjectId of the chunk.
chunks.files_id
The _id of the parent document, as specified in the files collection.
chunks.n
The sequence number of the chunk. GridFS numbers all chunks, starting with zero
chunks.data
Chunks payload as a BSON Binary type.
Files Collection:-
Each document in the files collection shows a file in GridFS.
{ "_id" : <ObjectId>, "length" : <num>, "chunkSize" : <num>, "uploadDate" : <timestamp>, "md5" : <hash>, "filename" : <string>, "contentType" : <string>, "aliases" : <string array>, "metadata" : <any>, }
Documents in the files collection can have some or all of the following fields:
files._id
The unique identifier for this document. The _id is of the data type we can choose from the original document. The default data type for MongoDB documents is BSON ObjectId.
files.length
The size of the document is in bytes.
files.chunkSize
The size of each chunk in bytes. GridFS divides the document into chunks of size chunkSize. The default size is 255 kilobytes.
files.uploadDate
The date of the document was first stored by GridFS. This value has the Date type.
files.md5
Deprecated
The MD5 algorithm is prohibited by FIPS 140-2. MongoDB drivers MD5 was deprecated support and will remove generation in future releases. Applications that require a file digest should implement it outside of GridFS and store it in the files.metadata.
files.filename
Optional. A human-readable name for the GridFS file.
files.contentType
Deprecated
files.metadata
File metadata is about information related to the MIME type of the GridFS file.
files.aliases
Deprecated
Optional. An array of alias strings. For application use only.