Learning Modern Linux - Chapter 5
Filesystems
Some would call this a dry topic, but in Linux and any UNIX based system “everything is a file”, so you should understand what’s going on with files. As the analogy suggests, nearly everything within the kernel is a file (with minor deviations). The filesystem within the kernel defines how each file is laid out.
Basics
Most distros give each user a single filesystem tree starting at / or root. Within each tree is either a directory or file. Easily list all files with ls, change directories with cd, and print your current working directory with cwd. The above are so important and at first, you’ll be using them pretty much all the time. Lastly, file systems are implemented in the kernel, rather than user land.
Some more advanced definitions we should be aware of;
- A drive is either a physical hard disk drive (HDD)/solid-state drive (SSD) or an emulated drive.
- Partitions are split-up segments of a drive, and volumes are similar but formatted for specific systems
- Inodes store metadata about files (size, owner, location, date, permissions). Does not store the filename or the actual file content
- Super blocks are the section in the beginning of a filesystem that captures the metadata of the filesystem
To see what drives/partitions/volumes make up your system you can use lsblk --exclude 7. The --exclude 7 excludes pseudo devices. After that, you can run findmnt -D -t nosquashfs, which will list out the filesystem of the device minus the read-only compressed squashfs types. Another step down, you can find specific information about a file using stat filename.
Files can also be linked to one another. There are two main types of links, Hard Links and Symbolic Links. Hard links are only references to inodes, while Symbolic are actually special files with it’s contents being a direct link to another file. Creating a link is easy, and we can explore the differences between the two in the following example;
$ ln something somealias
$ ln something somesoftalias
$ ls -al *alias
.rw-r--r--@ 15 richie 10 Jun 10:03 somealias
lrwxrwxrwx@ - richie 5 Jul 11:23 somesoftalias -> something
$ stat somealias
File: somealias
Size: 15 Blocks: 8 IO Block: 4096 regular file
Device: 0,44 Inode: 415150 Links: 2
Access: (0644/-rw-r--r--) Uid: ( 1000/ richie) Gid: ( 1000/ richie)
$ stat somesoftalias
File: somesoftalias -> something
Size: 9 Blocks: 8 IO Block: 4096 symbolic link
Device: 0,44 Inode: 488644 Links: 1
Access: (0777/lrwxrwxrwx) Uid: ( 1000/ richie) Gid: ( 1000/ richie)
Virtual File System
Linux does something really neat to be able to abstract out all of the syscalls needed to run your system and provide access to all the parts that make it up; the Virtual File System. This system is the main method a user will interact with files on their Linux machine. At it’s deepest layer, the VFS is responsible for deciding how to read in a file. Since a file is just a string of bytes, it’s the VFS’s job to dictate how that file gets accessed.
There are actually a few filesystems that exist within VFS;
- Local filesystems: (
ext3,XFS,FAT,NTFS) drivers needed to access block devices like HDDs and SSDs - In-memory filesystems: (
tmpfs) Files that exist in RAM - Pseudo filesystems: (
procfs) also in RAM but mainly for kernel interfacing/device abstraction - Networked filesystems: (
NFS) Similar to local, but obviously via a remote device. Covered in more detail in Chapter 7
There are a ton of syscalls (at least a 100) so learning all of them is not entirely necessary. There are 5 categories we break them down into depending on what resources they’re affecting;
- Inodes:
chmod,chown,stat - Files:
open,close,seek,truncate,read,write - Directories:
chdir,getcwd,link,unlink,rename,symlink - Filesystems:
mount,flush,chroot - Others:
nmap,poll,sync,flock
As always, if you come across a command you aren’t familiar with, you can run man <command> to get a rundown of proper use cases.
Logical Volume Manager
One downside of partitions is that they’re difficult to increase and decrease size. Like most examples, this isn’t an everyday issue for a casual user. This doesn’t mean that this won’t happen to you some day; say you invest in a new Linux machine, and create a basic partition for all of your files. If you didn’t designate enough space, getting more would be a real hassle. This is avoided by using a Logical Volume Manager.
Here’s the breakdown, one or more physical volumes (think again of your HDDs or SSDs, even partitions!) create a Volume Group. Each volume group contains one or more Logical Volumes, which are the actual resize-able blocks created and managed via the VG. If you were to run Arch Linux, using a LVM is heavily recommended and something even I’ve done before. I’d give it an easy-medium difficulty rating, considering it’s just a few CLI commands.
Creating Filesystems
Creating a new filesystem is straightforward. It takes two parts; making the filesystem and mounting it. The hardest part of the first step is choosing a filesystem format. We’ll take a look at those in more detail later, but for now, we can run mkfs -t ext4 /dev/some_vg/some_lv. Here, we used mkfs to make a filesystem, told it to use the ext4 format, and told it to create it in a logical volume.
Next, we mount it, or attach it to a filesystem tree. mount takes in both the device you want to attach too and where in the filesystem tree to place it. However, mounts are only active until the device turns off; in order to make them permanent, you would have to edit the fstab file.
Common Filesystem Layouts
How are the files actually stored long term? This is the actual filesystem layout, and there is one readily put in place called the Filesystem Hierarchy Standard or FHS. Obviously, you don’t have to follow this at all and can place files wherever you like, but for content management sake it’s always best practice to stick to the protocol;
bin, sbin: System programs and commands (think ofcat, orls)boot: Kernel images and related componentsdev: Devices like drivers and terminalsetc: System configuration fileshome: User home directorieslib: Shared system librariesmnt, media: Mount points for devices like USB sticks or portable HDDsopt: Depending on distro, package manager files usuallyproc, sys: Kernel interfacestmp: Temporary filesusr: User programs that are usually read onlyvar: User program files, like logs and backups
If you don’t know any of the terms above, don’t worry; that’s sort of the purpose of the rest of this book.
Pseudo Filesystems
We’ve gone over the filesystem from the view of our base machine; just our standard machines HDD or SDD. What if we want to plug in our external SD card? That’s where pseudo filesystems come in. They act like filesystems so we can interact with the content on the device without actually formatting the device. There are actually three major ones Linux has, and we’re going to go through each one.
procfs
Inherited from UNIX, originally mean to contain information about running processes, is the first one we’ll look at. You’ll find two different types of information in here mainly; per-process info with /proc/PID and random info on mounts, networking, memory, up-time, and more. It is honestly kind of a mess, and should be walked through slowly with things like the man pages as a reference.
It is most often useful when low-level debugging or developing system tooling. Here’s an example of getting info on the status command;
$ cat /proc/self/status | head -10
Name: cat
Umask: 0022
State: R (running)
Tgid: 51878
Ngid: 0
Pid: 51878
PPid: 40987
TracerPid: 0
Uid: 1000 1000 1000 1000
Gid: 1000 1000 1000 1000
sysfs
A much more structured directory, meant to help the kernel retrieve specific information about devices using a standardized layout. The directories are the same for every system;
block/: Symbolic links to discovered block devicesbus/: A directory for sub directories of physical bus types supported by your kernelclass/: Device classesdev/: Contains a separateblock/for block devices andchar/for character devices on your systemfirmware/: Management of firmware specific attributesfs/: Sub directories for filesystemsmodule/: Sub directories for each module loaded in your kernel
devfs
This filesystem is mainly for special device files (devices being a wide range of things from physical to software). Block devices are managed here (drives), Character devices like keyboard streams, and other random Special devices like /dev/null, /dev/random, or /dev/tty.
Regular Files
When talking about regular files, there’s no trick here. We are talking about the regular old files we see on a daily basis; documents, text files, images, config files, you name it. These files are again, managed by a filesystem type.
Common Filesystems
While there is no real term for what is a common vs an uncommon filesystem, there is a base amount that are supported right out of the gate within Linux. You can find your machine’s supported filesystems by running cat /proc/filesystems. Let’s talk about some of the most widely used;
ext4: main default in most distros. Backwards compatible withext3, and changes are stored using logs in a process called journaling, where file recovery is fastXFS: Another journaling filesystem used mostly in on workstations in the 90’s, but due to it’s fast IO streams and large file support it can be found in Red Hat distrosZFS: Combines a standard filesystem with a volume manager featureFAT: Mostly used for interoperability between Windows and storage devices
Chapter Review & Resources
lsblk: List block devicesfdisk: Manage partitionsblkid: Show block attributeshwinfo: Show hardware infofile -s: Show filesystem and partition infostat/df -i/ls -i: Show and list inode/file-related info- The Virtual File System abstracts the more complex syscalls needed for file management and access to produce a human-readable format
- See existing mounts with
mount <-t specific filesystem type> - Article on debugging using
/proc
Next: Chapter 6