The title of this article is a bit disingenuous, a bit of a liar, because the truth is that you can only restore a deleted file if another running process is still currently using it. What that means, and how to recover the deleted file, is the topic of today’s exciting post.
- What is an Open File?
- Recovering Files
- Other Uses
What is an Open File?
Some of the metadata includes the
inode number, the locations on the physical disk where the file has been written and the number of hard links that are pointing to the file. Incidentally, the name of the file is not part of the
The kernel data structure that contains the filename, cursor position and file mode, et al. is called the
file structure, and this is created by the kernel when a file is opened.
file struct also contains the
*f_op field which is a
file_operations struct, which is a pointer to a set of file functions (
lseek, etc.). Pretty cool.
The important thing to understand in the context of this article is that even though a file can be “deleted” from a directory, it actually only removes the
inode number from the directory entry which contains the file and decrements the hard link count in the
inode. If the hard link count is zero, then the data blocks are marked as free.
But, the actual data object (structure) itself is not removed as long as something still has a reference to it, like a process. The process' file descriptor that is a symbolic link to the “deleted” file in
/proc can then be copied (linked) back into the filesystem.
lsof command-line tool stands for list open files. What does
lsof consider to be a file? According to the man page:
An open file may be a regular file, a directory, a block special file, a character special file, an executing text reference, a library, a stream or a network file (Internet socket, NFS file or UNIX domain socket.) A specific file or all the files in a file system may be selected by path.
Looks like everything (really) is a file in Linux!
Anyway, let’s look at how to recover an accidentally deleted file. A scenario very similar to the one I’ll describe did happen to me a number of years ago, and thankfully I was able to restore the file, just as we’ll do today.
Let’s say I have file in the current directory that contains, oh, I don’t know, something about the President of France, and I’m reading it in
less. Since I’m a cool kid, I use a terminal multiplexer that allows me to have more than one terminal open in my screen, when, all of a sudden, my chubby little fingers crusted with KFC grease
rms the file.
Luckily, I still have the file open in
less. But, how do I restore it?
Let’s begin with
lsof without any flags or arguments will print every open file on the system, and we simply search for the string pattern of our file. This more than likely will take a couple of seconds.
The result of this piped output will show the following:
- the name of the process
- the owner
- the file descriptor (the
rmeans that it’s a regular file)
- the type of the node associated with the file
- the file’s major/minor device number
- the size of the file
inodenumber of the file
- the full path of the file
$ lsof | ag macron.txt less 1510962 btoll 4r REG 8,2 304 6555903 /home/btoll/projects/benjamintoll.com/macron.txt (deleted)
Now, that we know the
pid, we can list its file descriptors in the
$ ls -l /proc/1510962/fd total 0 lrwx------ 1 btoll btoll 64 Aug 30 00:45 0 -> /dev/pts/2 lrwx------ 1 btoll btoll 64 Aug 30 00:45 1 -> /dev/pts/2 lrwx------ 1 btoll btoll 64 Aug 30 00:45 2 -> /dev/pts/2 lr-x------ 1 btoll btoll 64 Aug 30 00:45 3 -> /dev/tty lr-x------ 1 btoll btoll 64 Aug 30 00:45 4 -> '/home/btoll/projects/benjamintoll.com/macron.txt (deleted)'
Here, you can see that the file descriptor
4 is a symbolic link to the deleted file:
$ file /proc/1510962/fd/4 /proc/1510962/fd/4: symbolic link to /home/btoll/projects/benjamintoll.com/macron.txt (deleted)
And, we can prove that some program (well, we know that’s
less) has opened it because we can see that
stat still shows one reference to it:
$ stat /proc/1510962/fd/4 File: /proc/1510962/fd/4 -> /home/btoll/macron.txt (deleted) Size: 64 Blocks: 0 IO Block: 1024 symbolic link Device: 5h/5d Inode: 23166603 Links: 1 Access: (0500/lr-x------) Uid: ( 1000/ btoll) Gid: ( 1000/ btoll) Access: 2022-08-30 01:29:07.574008620 -0400 Modify: 2022-08-30 01:28:49.098088543 -0400 Change: 2022-08-30 01:28:49.098088543 -0400 Birth: -
But, that’s ok, because we can still use the file descriptor to copy the contents. After all, its content hasn’t been deleted yet since its inode still contains a reference (the
less program), so we can copy the bits from disk that the inode references in its data structure.
$ cp /proc/1510962/fd/4 macron.txt.restored $ cat macron.txt.restored [original text]
There are other nifty uses for
lsof, as you can probably imagine.
What process has network files with
$ lsof -i tcp -s TCP:LISTEN COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 3987 btoll 3u IPv4 57565 0t0 TCP *:http-alt (LISTEN)
What process is bound to a specific port?
$ lsof -i tcp:8080 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME python3 3987 btoll 3u IPv4 57565 0t0 TCP *:http-alt (LISTEN)
What processes have opened a file?
$ lsof /tmp/ycm_jgy46s88.log COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME vim 1544747 btoll 3w REG 8,2 214 8912927 /tmp/ycm_jgy46s88.log $ ps aux | ag 544747 btoll 1544747 0.0 0.1 112220 25960 pts/1 Sl+ 01:59 0:00 vim baseball.py
Of course, now that have have the
pid, we can get all sorts of useful information from the kernel about the process in
$ ls - /proc/1544747/fd total 0 lrwx------ 1 btoll btoll 64 Aug 30 02:00 0 -> /dev/pts/1 lrwx------ 1 btoll btoll 64 Aug 30 02:00 1 -> /dev/pts/1 lrwx------ 1 btoll btoll 64 Aug 30 02:00 2 -> /dev/pts/1 l-wx------ 1 btoll btoll 64 Aug 30 02:00 3 -> /tmp/ycm_jgy46s88.log lrwx------ 1 btoll btoll 64 Aug 30 02:00 4 -> /home/btoll/.baseball.py.swp lr-x------ 1 btoll btoll 64 Aug 30 02:00 5 -> 'pipe:' lr-x------ 1 btoll btoll 64 Aug 30 02:00 7 -> 'pipe:' $ ls -l /proc/1544747/exe lrwxrwxrwx 1 btoll btoll 0 Aug 30 01:59 /proc/1544747/exe -> /usr/bin/vim.basic* $ ls -l /proc/1544747/cwd lrwxrwxrwx 1 btoll btoll 0 Aug 30 02:00 /proc/1544747/cwd -> /home/btoll/ $ cat /proc/1544747/cmdline vimbaseball.py
Which files does a process have open?
$ lsof -p 1544747 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME vim 1544747 btoll cwd DIR 8,2 4096 6029314 /home/btoll vim 1544747 btoll rtd DIR 8,2 4096 2 / vim 1544747 btoll txt REG 8,2 2906824 263017 /usr/bin/vim.basic vim 1544747 btoll mem REG 8,2 598104 262790 /usr/lib/x86_64-linux-gnu/libssl.so.1.1 vim 1544747 btoll mem REG 8,2 186344 524727 /usr/lib/python3.8/lib-dynload/_ssl.cpython-38-x86_64-linux-gnu.so vim 1544747 btoll mem REG 8,2 2954080 262363 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 vim 1544747 btoll mem REG 8,2 162264 11665647 /lib/x86_64-linux-gnu/liblzma.so.5.2.4 vim 1544747 btoll mem REG 8,2 74848 11665495 /lib/x86_64-linux-gnu/libbz2.so.1.0.4 ...
$ lsof -c less COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME less 1733935 btoll cwd DIR 8,2 4096 6029314 /home/btoll less 1733935 btoll rtd DIR 8,2 4096 2 / less 1733935 btoll txt REG 8,2 180064 262597 /usr/bin/less less 1733935 btoll mem REG 8,2 5699248 263354 /usr/lib/locale/locale-archive less 1733935 btoll mem REG 8,2 2029592 11665763 /lib/x86_64-linux-gnu/libc-2.31.so less 1733935 btoll mem REG 8,2 192032 11665489 /lib/x86_64-linux-gnu/libtinfo.so.6.2 less 1733935 btoll mem REG 8,2 191504 11665682 /lib/x86_64-linux-gnu/ld-2.31.so less 1733935 btoll 0u CHR 136,4 0t0 7 /dev/pts/4 less 1733935 btoll 1u CHR 136,4 0t0 7 /dev/pts/4 less 1733935 btoll 2u CHR 136,4 0t0 7 /dev/pts/4 less 1733935 btoll 3r CHR 5,0 0t0 13 /dev/tty less 1733935 btoll 4r REG 8,2 608 6036785 /home/btoll/macron.t
Which files does a specific user have open?
$ lsof -u btoll ...
Many of the options can be combined.