7 min read

On Recovering Deleted Files with `lsof`

The title of this article is a bit disingenuous, a bit of a liar, because the truth is that you can only restore a deleted file if another running process is still currently using it. What that means, and how to recover the deleted file, is the topic of today’s exciting post.



What is an Open File?

Before we get into the lsof command, it’s important to understand a bit about how the Linux kernel treats an inode and an open file.

Each file has metadata that describes it, and this data is contained in an inode structure. This is the information retrieved by the stat command.

Some of the metadata includes the inode number, the locations on the physical disk where the file has been written and the number of hard links that are pointing to the file. Incidentally, the name of the file is not part of the inode.

The kernel data structure that contains the filename, cursor position and file mode, et al. is called the file structure, and this is created by the kernel when a file is opened.

The file struct also contains the *f_op field which is a file_operations struct, which is a pointer to a set of file functions (open, read, write, mmap, lseek, etc.). Pretty cool.

The important thing to understand in the context of this article is that even though a file can be “deleted” from a directory, it actually only removes the inode number from the directory entry which contains the file and decrements the hard link count in the inode. If the hard link count is zero, then the data blocks are marked as free.

But, the actual data object (structure) itself is not removed as long as something still has a reference to it, like a process. The process' file descriptor that is a symbolic link to the “deleted” file in /proc can then be copied (linked) back into the filesystem.

Neat!

lsof

The lsof command-line tool stands for list open files. What does lsof consider to be a file? According to the man page:

An open file may be a regular file, a directory, a block special file, a character special file, an executing text reference, a library, a stream or a network file (Internet socket, NFS file or UNIX domain socket.) A specific file or all the files in a file system may be selected by path.

Looks like everything (really) is a file in Linux!

Recovering Files

Anyway, let’s look at how to recover an accidentally deleted file. A scenario very similar to the one I’ll describe did happen to me a number of years ago, and thankfully I was able to restore the file, just as we’ll do today.

Let’s say I have file in the current directory that contains, oh, I don’t know, something about the President of France, and I’m reading it in less. Since I’m a cool kid, I use a terminal multiplexer that allows me to have more than one terminal open in my screen, when, all of a sudden, my chubby little fingers crusted with KFC grease rms the file.

Luckily, I still have the file open in less. But, how do I restore it?

Let’s begin with lsof. Using lsof without any flags or arguments will print every open file on the system, and we simply search for the string pattern of our file. This more than likely will take a couple of seconds.

The result of this piped output will show the following:

  • the name of the process
  • the pid (process id)
  • the owner
  • the file descriptor (the r means that it’s a regular file)
  • the type of the node associated with the file
  • the file’s major/minor device number
  • the size of the file
  • the inode number of the file
  • the full path of the file
$ lsof | ag macron.txt
less      1510962                              btoll    4r      REG                8,2       304    6555903 /home/btoll/projects/benjamintoll.com/macron.txt (deleted)

Now, that we know the pid, we can list its file descriptors in the proc pseudo-filesystem:

$ ls -l /proc/1510962/fd
total 0
lrwx------ 1 btoll btoll 64 Aug 30 00:45 0 -> /dev/pts/2
lrwx------ 1 btoll btoll 64 Aug 30 00:45 1 -> /dev/pts/2
lrwx------ 1 btoll btoll 64 Aug 30 00:45 2 -> /dev/pts/2
lr-x------ 1 btoll btoll 64 Aug 30 00:45 3 -> /dev/tty
lr-x------ 1 btoll btoll 64 Aug 30 00:45 4 -> '/home/btoll/projects/benjamintoll.com/macron.txt (deleted)'

Here, you can see that the file descriptor 4 is a symbolic link to the deleted file:

$ file /proc/1510962/fd/4
/proc/1510962/fd/4: symbolic link to /home/btoll/projects/benjamintoll.com/macron.txt (deleted)

And, we can prove that some program (well, we know that’s less) has opened it because we can see that stat still shows one reference to it:

$ stat /proc/1510962/fd/4
  File: /proc/1510962/fd/4 -> /home/btoll/macron.txt (deleted)
  Size: 64              Blocks: 0          IO Block: 1024   symbolic link
Device: 5h/5d   Inode: 23166603    Links: 1
Access: (0500/lr-x------)  Uid: ( 1000/   btoll)   Gid: ( 1000/   btoll)
Access: 2022-08-30 01:29:07.574008620 -0400
Modify: 2022-08-30 01:28:49.098088543 -0400
Change: 2022-08-30 01:28:49.098088543 -0400
 Birth: -

But, that’s ok, because we can still use the file descriptor to copy the contents. After all, its content hasn’t been deleted yet since its inode still contains a reference (the less program), so we can copy the bits from disk that the inode references in its data structure.

$ cp /proc/1510962/fd/4 macron.txt.restored
$ cat macron.txt.restored
[original text]

Other Uses

There are other nifty uses for lsof, as you can probably imagine.

What process has network files with TCP state LISTEN?

$ lsof -i tcp -s TCP:LISTEN
COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 3987 btoll    3u  IPv4  57565      0t0  TCP *:http-alt (LISTEN)

What process is bound to a specific port?

$ lsof -i tcp:8080
COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
python3 3987 btoll    3u  IPv4  57565      0t0  TCP *:http-alt (LISTEN)

What processes have opened a file?

$ lsof /tmp/ycm_jgy46s88.log
COMMAND     PID  USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
vim     1544747 btoll    3w   REG    8,2      214 8912927 /tmp/ycm_jgy46s88.log
$ ps aux | ag [1]544747
btoll    1544747  0.0  0.1 112220 25960 pts/1    Sl+  01:59   0:00 vim baseball.py

Of course, now that have have the pid, we can get all sorts of useful information from the kernel about the process in /proc:

$ ls - /proc/1544747/fd
total 0
lrwx------ 1 btoll btoll 64 Aug 30 02:00 0 -> /dev/pts/1
lrwx------ 1 btoll btoll 64 Aug 30 02:00 1 -> /dev/pts/1
lrwx------ 1 btoll btoll 64 Aug 30 02:00 2 -> /dev/pts/1
l-wx------ 1 btoll btoll 64 Aug 30 02:00 3 -> /tmp/ycm_jgy46s88.log
lrwx------ 1 btoll btoll 64 Aug 30 02:00 4 -> /home/btoll/.baseball.py.swp
lr-x------ 1 btoll btoll 64 Aug 30 02:00 5 -> 'pipe:[23204158]'
lr-x------ 1 btoll btoll 64 Aug 30 02:00 7 -> 'pipe:[23204159]'
$ ls -l /proc/1544747/exe
lrwxrwxrwx 1 btoll btoll 0 Aug 30 01:59 /proc/1544747/exe -> /usr/bin/vim.basic*
$ ls -l /proc/1544747/cwd
lrwxrwxrwx 1 btoll btoll 0 Aug 30 02:00 /proc/1544747/cwd -> /home/btoll/
$ cat /proc/1544747/cmdline
vimbaseball.py

Which files does a process have open?

By pid:

$ lsof -p 1544747
COMMAND     PID  USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
vim     1544747 btoll  cwd    DIR    8,2     4096  6029314 /home/btoll
vim     1544747 btoll  rtd    DIR    8,2     4096        2 /
vim     1544747 btoll  txt    REG    8,2  2906824   263017 /usr/bin/vim.basic
vim     1544747 btoll  mem    REG    8,2   598104   262790 /usr/lib/x86_64-linux-gnu/libssl.so.1.1
vim     1544747 btoll  mem    REG    8,2   186344   524727 /usr/lib/python3.8/lib-dynload/_ssl.cpython-38-x86_64-linux-gnu.so
vim     1544747 btoll  mem    REG    8,2  2954080   262363 /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
vim     1544747 btoll  mem    REG    8,2   162264 11665647 /lib/x86_64-linux-gnu/liblzma.so.5.2.4
vim     1544747 btoll  mem    REG    8,2    74848 11665495 /lib/x86_64-linux-gnu/libbz2.so.1.0.4
...

By name:

$ lsof -c less
COMMAND     PID  USER   FD   TYPE DEVICE SIZE/OFF     NODE NAME
less    1733935 btoll  cwd    DIR    8,2     4096  6029314 /home/btoll
less    1733935 btoll  rtd    DIR    8,2     4096        2 /
less    1733935 btoll  txt    REG    8,2   180064   262597 /usr/bin/less
less    1733935 btoll  mem    REG    8,2  5699248   263354 /usr/lib/locale/locale-archive
less    1733935 btoll  mem    REG    8,2  2029592 11665763 /lib/x86_64-linux-gnu/libc-2.31.so
less    1733935 btoll  mem    REG    8,2   192032 11665489 /lib/x86_64-linux-gnu/libtinfo.so.6.2
less    1733935 btoll  mem    REG    8,2   191504 11665682 /lib/x86_64-linux-gnu/ld-2.31.so
less    1733935 btoll    0u   CHR  136,4      0t0        7 /dev/pts/4
less    1733935 btoll    1u   CHR  136,4      0t0        7 /dev/pts/4
less    1733935 btoll    2u   CHR  136,4      0t0        7 /dev/pts/4
less    1733935 btoll    3r   CHR    5,0      0t0       13 /dev/tty
less    1733935 btoll    4r   REG    8,2      608  6036785 /home/btoll/macron.t

Which files does a specific user have open?

$ lsof -u btoll
...

Many of the options can be combined.

Summary

Weeeeeeeeeeeeeeeeeeeeeee

References