TL;DR A filesystem organizes the information on disks to support reading and writing a large number of files.
Information that needs to be persisted is stored on the disk. You want to store a lot of data in it, which requires you to organize the information well. In disks, the space is divided into pages. The interface provided by the hardware disks can be simplified as two APIs:
byte[] read(pageNum long)
boolean write(pageNum long, byte[] data)
You need to build a system that stores and retrieves a large amount of data with these APIs. We are talking about a file system, which is part of the operating system.
What's a file system? It's a tree structure where there are two types of nodes: directory and file.
I'm just going to put down a theoretic design of how things work below. It may be far from the real thing.
Firstly, we need to keep track of is which pages are allocated and which pages are free.
We could use a binary bit tree to represent that information.
We will reserve a few pages to store system-level information.
The root directory can be stored at a specific, well-known page.
For a directory type, what exactly is stored there?
For a file type, it is similar
The above probably describes what an inode is. The inode of a file contains a file allocation table (FAT) and indicates where the content of the files are stored on disks.
open(filepath)
: open a file so that it is ready for reading or writing.read()
: Read from a certain position of a filewrite()
: Write to a certain position of a fileThese APIs are exposed as system calls by the kernel. We will look closer into what happen exactly for these system calls after we introduce the concept of page caches.
There's something called a NFS (network filesystem) that faces vastly different challenges. That warrants a separate post.