Very often beginners hear a phrase "Everything is a file on Linux/Unix". However, what are the directories then? How are they different from files?
Ubuntu – What are directories, if everything on Linux is a file
directoryfiles
Related Solutions
In Unix/Linux dot-files refers to files/directories with a .
prepended to their name. Examples are ~/.bashrc
, ~/.bash_profile
, etc. The leading dot .
is used as an indicator by software like bash and nautilus to not list these files normally but only when they are specifically requested like pressing Ctrl+H in Nautilus. This is because, generally, dot-files are used to store configurations for different applications but they are sometimes used otherwise as well. For example Mozilla creates a .mozilla
folder which contains their configuration files as well as browser cache.
People tend to backup & also share their dot-files so others can boot-strap their own applications using those configuration files. An example of a site dedicated to sharing dot-files is http://dotfiles.org.
While your question is tagged with bash
, this would be somewhat troublesome ( in my humble opinion ) to use bash
for such task. I'd suggest using python because it has a lot of good functions for complex tasks and this answer provides a solution using that language.
Essentially what occurs here is that we use regex to split filenames at multiple delimiters, get only first part and use unique set of those first parts as basenames for new directories.
We then traverse the top directory again , and sort the files in their appropriate places.
The script doesn't do anything spectacular, and actually in algorithm analysis this wouldn't do too well, because of the nested for loops, but for "quick and dirty, yet workable" solution it's alright. If you are interested what each line does, there's plenty of comments added to explain the functionality
Note, the demo only shows printing of the new filenames for testing purpose only. Uncomment the os.rename()
part to actually move the file.
The Demo
bash-4.3$ # Same directory structure as in OP example
bash-4.3$ ls TESTDIR
bash-4.3$ # now run script
AAA AAA.mkv AAA.nfo AAA-picture.jpg BBB BBB-clip.mp4 BBB.mp4 BBB.srt
bash-4.3$ ./collate_files.py ./TESTDIR
/home/xieerqi/TESTDIR/AAA/AAA-picture.jpg
/home/xieerqi/TESTDIR/AAA/AAA.mkv
/home/xieerqi/TESTDIR/AAA/AAA.nfo
/home/xieerqi/TESTDIR/BBB/BBB.srt
/home/xieerqi/TESTDIR/BBB/BBB.mp4
/home/xieerqi/TESTDIR/BBB/BBB-clip.mp4
Script itself
#!/usr/bin/env python
import re,sys,os
top_dir = os.path.realpath(sys.argv[1])
# Create list of items in directory first
# splitting names at multiple separators
dir_list = [os.path.join(top_dir,re.split("[.-]",f)[0])
for f in os.listdir(top_dir)
]
# Creating set ensures we will have unique
# directory namings
dir_set = set(dir_list)
# Make these directories first
for dir in dir_set:
if not os.path.exists(dir):
os.mkdir(dir)
# now get all files only, no directories
files_list = [f for f in os.listdir(top_dir)
if os.path.isfile(os.path.join(top_dir,f))
]
# Traverse lists of directories and files,
# check if a filename starts with directory
# that we're testing now, and if it does - move
# the file to that directory
for dir in dir_set:
id_string = os.path.basename(dir)
for f in files_list:
filename = os.path.basename(f)
if filename.startswith(id_string):
new_path = os.path.join(dir,filename)
print(new_path)
#os.rename(f,new_path)
Additional notes:
- The script can well be adapted to split files at other multiple separators (in the
re.split()
function): add inside square brackets ( meaning"[.-]"
) add whatever characters you want. - The moving part is performed with
os.rename()
function. Alternatively you couldimport shutil
and useshutil.move()
function. See https://stackoverflow.com/a/8858026/3701431
Best Answer
Note: originally this was written to support my answer for Why is the current directory in the
ls
command identified as linked to itself? but I felt that this is a topic that deserves to stand on its own, and hence this Q&A.Understanding Unix/Linux filesystem and files: Everything is an inode
Essentially, a directory is just a special file, which contains list of entries and their ID.
Before we begin the discussion, it's important to make a distinction between a few terms and understand what directories and files really represent. You may have heard the expression "Everything is a file" for Unix/Linux. Well, what users often understand as file is this:
/etc/passwd
- An object with a path and a name. In reality, a name (be it a directory or file, or whatever else) is just a string of text - a property of the actual object. That object is called inode or I-number, and stored on disk in the inode table. Open programs also have inode tables, but that's not our concern for now.Unix's notion of a directory is as Ken Thompson put it in a 1989 interview:
An interesting observation can be made from Dennis Ritchie's talk in 1972 that
...but there's no mention of inodes anywhere in the talk. However, the 1971 manual on
format of directories
states:So it has been there since the beginning.
Directory and inode pairing is also explained in How are directory structures stored in UNIX filesystem?. a directory itself is a data structure, more specifically: a list of objects (files and inode numbers) pointing to lists about those objects (permissions, type, owner, size, etc.). So each directory contains its own inode number, and then filenames and their inode numbers. Most famous is the inode #2 which is
/
directory. (Note, though that/dev
and/run
are virtual filesystems, so since they are root folders for their filesystem, they also have inode 2; i.e. an inode is unique on its own fileystem, but with multiple filesystems attached, you have non-unique inodes ). the diagram borrowed from the linked question probably explains it more succinctly:All that information stored in the inode can be accessed via
stat()
system calls, as per Linuxman 7 inode
:Is it possible to access a file only knowing its inode number ( ref1 , ref2 )? On some Unix implementations it is possible but it bypasses permission and access checks, so on Linux it's not implemented, and you have to traverse the filesystem tree (via
find <DIR> -inum 1234
for example) to get a filename and its corresponding inode.On the source code level, it's defined in the Linux kernel source and is also adopted by many filesystems that work on Unix/Linux operating systems, including ext3 and ext4 filesystems (Ubuntu default). Interesting thing: with data being just blocks of information, Linux actually has inode_init_always function that can determine if an inode is a pipe (
inode->i_pipe
). Yes, sockets and pipes are technically also files - anonymous files, which may not have a filename on disk. FIFOs and Unix-Domain sockets do have filenames on filesystem.Data itself may be unique, but inode numbers aren't unique. If we have a hard link to foo called foobar, that will point to inode 123 as well. This inode itself contains information as to what actual blocks of disk space are occupied by that inode. And that's technically how you can have
.
being linked to the directory filename. Well,almost: you can't create hardlinks to directories on Linux yourself, but filesystems can allow hard links to directories in a very disciplined way, which makes a constraint of having only.
and..
as hard links.Directory Tree
Filesystems implement a directory tree as one of the tree datastructures. In particular,
Key point here is that directories themselves are nodes in a tree, and subdirectories are child nodes, with each child having a link back to the parent node. Thus, for a directory link the inode count is minimum 2 for a bare directory (link to directory name
/home/example/
and link to self/home/example/.
), and each additional subdirectory is an extra link/node:The diagram found on Ian D. Allen's course page shows a simplified very clear diagram:
The only thing in the RIGHT diagram that's incorrect is that files aren't technically considered being on the directory tree itself: Adding a file has no effects on the links count:
Accessing directories as if they're file
To quote Linus Torvalds:
Considering that a directory is just a special case of a file, naturally there have to be APIs that allow us to open/read/write/close them in a similar fashion to regular files.
That's where
dirent.h
C library comes into place, which defines thedirent
structure, which you can find in man 3 readdir:Thus, in your C code you have to define
struct dirent *entry_p
, and when we open a directory withopendir()
and start reading it withreaddir()
, we'll be storing each item into thatentry_p
structure. Of course, each item will contain the fields defined in the template fordirent
shown above.The practical example of how this works can be found in my answer on How to list files and their inode numbers in the current working directory.
Note that the POSIX manual on fdopen states that "[t]he directory entries for dot and dot-dot are optional" and readdir manual states
struct dirent
is only required to haved_name
andd_ino
fields.Note on "writing" to directories: writing to a directory is modifying its "list" of entries. Hence, creating or removing a file is directly associated with directory write permissions, and adding/removing files is the writing operation on said directory.