Hiding Data in File Slack Space Using Python (Linux)
Authors: Zachary Parish, Aaron Lam, Omid Novtash
Introduction:
When files are written to disk they typically take up more space than their file size. This is because files are allocated a particular number of clusters (also called data units or blocks) on the disk. These clusters are the smallest units of space on disk that the file system will allocate to a file. If a file is smaller than the size of a cluster, or if a file’s size is not divisible by the cluster size, then a section of unused space exists on the disk between the end of the file data and the end of its allocated clusters. This space is called ‘file slack space’ [1,2]. In this article, we will demonstrate how data can be hidden within file slack space and explain the development of a tool to manage the allocation of data in aggregated file slack space.
End Goal:
Our end goal is a python program that can discover and map all available file slack space on a drive and then manage reading and writing data from this space. Our program is therefore a basic anti-forensics tool which obscures data by hiding it in file slack space. In order to do this we will need to complete a number of steps:
- Find the location and size of file slack space for all files on a drive.
- Determine a method to manage which file slack space regions are allocated and which are free.
- Write data to free slack space.
- Read data from allocated slack space.
- Clear data out of slack space.
We will also need to find a method to aggregate and store the data from multiple files or directories in a single blob that can easily be written to and read from slack space.
Background Information:
Let’s start off by explaining some background information necessary to understand this article.
Drive and File System Basics:
For our purposes, a drive is a physical device that provides some storage space. For instance, a Hard Drive Disk (HDD) or a usb drive. A drive can be partitioned into segments called partitions which are then formatted with file systems. File systems provide some logical means of managing files on a partition. When a user interacts with a file, they do so through the file system [2].
For example, consider a usb drive. This drive has 32GBs of total storage in a single volume. We can partition this volume to have a 28GB partition and a 4GB partition. We could then format the 4GB partition with the ext4 file system and once we mount the drive, we can use it to store files.
It’s important to note that from a user perspective, the filesystem does not provide any information about where the files are physically located on the disk, or even logically located when we consider the disk as a whole. The user interacts with the files as logically organized by the file system (e.g., /home/user/file.txt is a file in a user’s home directory, but we cannot where on the disk the file is stored.
We can visualize a drive as being a large array of storage locations that we can address linearly. So the first byte of the drive is location 0 and the 8th is location 8. For practicality purposes, single bytes on a drive are never addressed in this way. Instead, a drive is divided into sectors (often of 512 bytes) and each one of these sectors is given a logical block address (LBA) which is the number of sectors into the drive it is [2]. For instance, sector 0 would be the first 512 bytes on the drive and sector 1 would be the bytes from 512 to 1024 on the drive. File systems usually take this grouping further, and only manage units of storage as ‘clusters’ (also called data units or blocks). These clusters are composed of several sectors grouped together (often 8 sectors are grouped into one cluster, leading to a cluster size of 4096 bytes)[2]. It is from this grouping that file slack space arises.
What is slack space?
Consider the case where we want to store a file that is 96 bytes in size. The file system will locate a free cluster for this file and write its contents into storage. However, if our cluster size is 4096 bytes, then we are left with 4000 bytes of empty space that is allocated to our file, but unused. This space is referred to as file slack space [1]. There are other types of slack space [2], but for our article we will take slack space to mean file slack space. This slack space phenomenon explains why the ‘size on disk’ for a file is often not the same as the file’s actual size. This doesn’t just take place with files smaller than the cluster size. If a file is not evenly divisible by the cluster size of a file system, then the last cluster allocated will always contain some slack space.
Hiding data in slack space utilizes the space between the actual size of the file and what was allocated. We locate a file’s slack space on the drive and then write our data within it. From the perspective of a typical user, this data is not accessible. The size of the file we hid this data in will not change, neither will the results of hashing the file with utilities like md5hash. It is however important to note that this space is not reliable as the file system is not aware of the hidden data and will therefore take no steps to avoid overwriting it when files are edited, moved or deleted.
File System Forensic (The Basic Idea):
File system forensics is the process of using analytical techniques on a file system, examining it to recover files, learn details about a specific file or find hidden files/data [3]. This is often done to hard drives or storage devices that are suspected to contain evidence related to a crime however, it can be used to also recover data that was accidentally deleted or for debugging purposes [3].
Requirements/Assumptions/Limitations:
For practicality purposes, we will focus on a single operating system (Linux) and a single file system (Ext4). To run our program or follow along, you will need to be running Linux (our program was tested on Arch Linux, but Ubuntu or any other Linux distro should be fine). You will also need to have Python 3 installed (only built in libraries are required). Additionally, the following utilities should be installed: df, debugfs, tar, gzip, dd, fdisk, split, wget, stat (these should be installed by default on most systems). Our demonstration makes use of a 4GB usb drive partition that is formatted with Ext4. We will go over setting up the environment in a later section.
it should again be noted that slack space is not a reliable storage location. Normal usage of a disk (e.g., deleting files, moving files, adding new files) will eventually overwrite slack space. This is because the filesystem is not aware that data is being stored in slack space locations. For instance, if a file is deleted, the clusters it once held, including the slack space, will be marked as free. If another file is written to disk, the filesystem may allocate those blocks to a new file and overwrite the data stored in slack space [2]. As a result, practical usage of this method and our tool requires that a disk be maintained in the state it was when files were written to slack space. In other words, no files should be added, moved or removed from the disk while slack space is occupied. We will test the resilience of our tool to these sorts of actions in a later section. Keep in mind that while this requirement is rather strict, it is consistent with the requirements for hidden volume drives like those found in VeraCrypt [4], where a decoy partition that hides the encrypted volume must be left alone to ensure the hidden data is not corrupted.
Preparing the Environment:
This section will explain all of the preparatory steps needed to follow along or run our tool yourself. Recall that you should be running Linux, and that python 3 and the utilities listed in the requirements section should be installed.
Preparing the disk:
We will start by preparing the drive whose file slack space we will eventually make use of. In our case, the disk is a 4GB partition on a usb drive. We will assume that you have a blank partition on a drive to begin this process.
Using fdisk (sudo fdisk -l) we can see our drive, the partition we will use is /dev/sdb1.
We cannot immediately use this drive for our purposes. For consistency sake, we need to make sure that the drive does not contain any data left over from previous usage. Recall that when a file is deleted, the data associated with it is not removed from the drive, only the references to those locations [2]. In this way, our new drive may actually still contain pieces of data from the previous use. We can remedy this by overwriting all values on the drive with zeros using dd. We can do this with the command “sudo dd if=/dev/zero of=/dev/sdb1 status=progress”, where if is the input we are writing to the disk (/dev/zero), and of is the partition we are writing the zeros to (/dev/sdb1, though you will need to replace this with your device). The time it takes to complete this command will vary based on the size of the drive being zeroed, using “status=progress” will show the progress during the overwrite.
Now that our drive contains only zeros, let’s format it for the ext4 filesystem using mkfs. We use the command “sudo mkfs.ext4 /dev/sdb1”.
In order to mount this drive we will need a mount point. We’ve created one at /mnt/testusb. This can be achieved by using the mkdir command to create a directory in your desired location.
Now we can mount the drive with “sudo mount /dev/sdb1 /mnt/testusb”. If we navigate to the drive, we can see that it contains no files, save for the lost+found folder.
Making Decoy Files:
In order to have slack space which we can occupy we need the drive to contain files. In practical usage, we’d want to consider the best files for decoy purposes in terms of believability. For our example however, we can simply fill the device with files that will provide ample space. We use wget to download a text file version of a book.
We next use the split with “sudo split -l 100 890–0.txt” utility to break this large file into many smaller text files. Each of these files will provide us with some usable file slack space.
Now that we have a prepared drive and several files to provide slack space we can begin to start on our tasks.
How do we locate file slack space on a drive?:
Our process for finding the location of slack space works as follows:
- Find the inode associated with each file.
- Use debugfs to find the extents for a file.
- Convert the extents into byte offsets from the start of the drive
- Read or write to this location by loading the disk in python and seeking the offset location.
We will start with an example of this process from the shell and then explain how this is done programmatically in python.
We first choose which file’s slack space we want to find. Let’s start with the file ‘xaa’. We can see from ls that the size of this file is 3.3K, which is less than the size of the drive’s clusters (4096) so we know that some file slack space will exist in the end of this file’s allocated cluster. Let’s hash the file so we can tell in the future if the changes we make to slack space are detectable by reading the file. We use “md5sum xaa”:
We can next use the stat command “stat xaa” to determine the file’s inode number (13) [5].
Now that we have the inode number, how can we determine the actual location of the file’s storage on the drive? We will use the debugfs utility to obtain the file’s ‘extents’ and then convert these values into the byte offset from the start of the disk to the file.
The “-R” argument to debugfs tells the program to execute the following command enclosed in quotation marks and then exit [6]. We specify that it should return the stat values associated with inode number 13 (which is xaa’s inode number). We also specify the disk that we are interested in, in this case /dev/sdb1.
For our purposes, the important information is found in the “EXTENTS” section. A file’s extents are the clusters that are allocated to store that file. In this case, we can see that xaa is stored in only one cluster, numbered “34816”. This makes sense since our cluster size is 4096 bytes, and xaa is only 3.3KB, so it fits entirely in one cluster with some space left over. For comparison let’s look at the debugfs output for innode 12, which is the inode associated with the large text file we split from.
This file required considerably more space to store and we can see that it was assigned clusters from 33281 to 33583.
Now let’s take a look at what these clusters look like when we load the drive in python. We start by opening python as root (using sudo) and then opening our drive. We next make use of the seek function to skip to the location of the drive we are interested in, cluster 34816 [7]. The seek function expects a number of bytes to skip, so we must convert our cluster number into bytes by multiplying it by the size of the clusters (in our case, 4096).
Finally, we read 4096 bytes (1 cluster) from the disk at that location:
Here we can see the contents of the xaa file, and the file slack space that exists after the file. At this point, we are ready to write data into slack space! We can start by reopening the drive in binary write mode. We can then seek to the location of our file, and skip 4000 bytes so that we are writing to a location in slack space. Next we will write a short hidden message into the slack space.
If we print out the contents of the cluster again, we will see our hidden message in slack space.
Even though this data now exists within the cluster, it will not be included in the contents of the file. So if we use the ‘cat’ utility to print the xaa file, we will only see the original text content. If we use the ‘ls’ command, we will also see that the file size is unchanged. Finally, let’s use the md5sum command again to see if the hash of our file is changed by our data in slack space.
As expected, the hash is the same as above. Note that this is because the utility is only reading the file’s data, not the entire cluster. Had we hashed the entire cluster, the value would have changed. We’ve now seen how we can locate file slack space and how we can read and write data to it using python. We’ve also seen how this hidden data does not appear when we examine the file size or hash.
Manual Hiding Demo
We Can Read and Write to File Slack Space, What’s Next?
There are still a lot of limitations and usability issues with the technique presented. Recall that our clusters are only 4096 bytes in size. This means that we have at most 4096 minus the file size bytes to hide data within. While this may be alright for very small text files, we won’t be able to hide very much data in this way. What we would like is a way to aggregate the slack space available from all files in the drive together so that we can hide larger amounts of data. Our current method also requires us to memorize where the hidden data is stored either in terms of extents or number of bytes from the start of the disk. This becomes increasingly impractical as we increase the size of data we’d like to hide and begin to span more and more clusters. What we need is a way to keep track of the regions of slack space our data are hidden in. For this purpose, we’ll develop a Slack Space Management System.
Slack Space Management System (SSMS):
Before we begin writing code, let’s start with a high level overview of our slack space management system and how it will behave when presented with a new drive.
The system will start by creating a list of all files on the drive. Next it will compute the amount of slack space each file has by subtracting its file size from the size of the clusters allocated to the file. For simplicity sake, we will set a minimum size that we will consider for our units of free slack space. We will call these units ‘slack sectors’. Slack sectors will be 512 bytes in size. So if a file is 4000 bytes and stored in one 4096 byte cluster, we will not use the free 96 bytes for storage. This will help us to have a consistent size of storage space to deal with and should only result in minimal space loss, especially with our test usb. Slack sectors will be represented as the number of bytes from the start of the disk to the 512 byte region (the byte offset). The program will generate a list of slack sectors on the disk sorted from lowest (closest to the start of the drive) to highest (closest to the end of the drive).
Once the locations of all slack sectors is known, we will assign the first slack sector to store what we will call the ‘Slack Allocation Table’ or SAT. The SAT is a table, 510 bytes in size, that will keep track of hidden files located in slack space. Each entry in the table has a start slack sector (8 bytes), an end slack sector (8 bytes) and an ID number (1 byte). This means that our table will be able to hold at most 30 entries. While this is a limitation, in practice these ‘hidden files’ will be blobs of data that can contain many files, so there won’t be much need for a large number of entries. Additionally, simple modifications could be made to extend the size of the SAT in future work.
At this point the program will have an ordered list of all slack sectors and an empty SAT. For simplicity we want to make sure that hidden file blobs are allocated in ‘contiguous’ slack sectors, in other words, hidden blobs should not be fragmented in slack space. To achieve this, we will read the slack table and locate the allocated slack sectors (i.e. any sectors that lie between a start and an end in the slack table). Free slack sectors will then be divided into regions of ‘contiguous’ unallocated slack space. With a new drive, all slack sectors will be in a single region that spans the entirety of slack space.
When a blob is written into slack space, the program will first find a region that is large enough to hold the entire blob. Then, a new slack entry will be created using the start and end slack sectors needed for allocation. This entry will be added to the slack table which will then be written to the drive. Finally, the program will write the data from the blob into slack sectors in chunks of 512 bytes.
The next time the program opens, it will repeat the process of finding all slack space sectors, but using the SAT, will be able to tell that the sectors allocated to the previous blob are no longer available. If the previous blob is removed from slack space, the program will remove its entry in the slack table and its region will become available for allocation again.
Coding the SSMS:
We’ll start by creating a class to handle reading and writing from drives. We’ll call this the “drive handler”.
Here’s the class definition and init method for the drive handler class:
To create an instance of the drive_handler class, we’ll need to provide a path to the mount point of the target drive, in our example this will be /mnt/testusb. The program will then derive the device mounted at the mount point using subprocess and the ‘df’ utility through the get_device_path method. This is a common pattern for our tool’s interaction with filesystems. Rather than parse data structures ourselves, we will rely on Linux utilities for well established processes (e.g., getting inode numbers, getting extents).
We can see that when the object is first instantiated, its list of files on the drive is empty, so is the list of empty regions, allocated regions and the totals of free and allocated space. This is because we have yet to load the SAT or read any data from the disk. To denote this, the ‘prepared’ variable is set to False.
To prepare the disk, we will need a method that handles all of the steps we previously described:
This method handles the output of information about the slack space of the drive and makes calls to other methods that handle different preparation steps. We will first look at the ‘get_files’ method, which is in charge of finding all files on the drive.
This simple method makes use of the built-in os.walk [8] function to find all of the files on the drive. We walk starting from the mount point path and record all absolute paths to discovered files. When completed, the method sets the file_paths variable of the drive handler object to a list of absolute file paths.
Next we need to find the slack space locations on the drive, and use the slack table to determine which slack sectors are allocated and which are free. We will also need to group the free slack sectors into regions. This is the center of our slack space allocation system so we’ll show all of the code for this method and then break down each smaller segment and explain it in detail.
Let’s start by looking that first section where we locate ‘slack space ranges’:
In this section we loop through all files found on the target drive and extract the regions of slack space that exist on the drive. We start by again using the os library [8], this time to use the stat function. This function returns a number of values for a provided file path. We are interested in st_size and st_ino, which are the file size and its inode respectively. With this information we can make a subprocess [9] call to execute the same debugfs command we saw earlier when we manually hid data. We tell subprocess to capture output, which leaves us with a string that we can parse.
We next perform some string manipulation to get the information we want related to the file’s extents. Recall what the output of debugfs looks like:
To extract the extents information we split the string and select everything that comes after “EXTENTS:”, next we split again and select everything after the ‘:’ character. We then split based on newlines and remove empty strings from the resulting list. We now have an array that contains one string, which is either a single value like in the inode 13 example, or value-value like the inode 12 example. We therefore check the size of the string if it is split based on ‘-’. If there is only one value, then our file is allocated only one cluster, if it is two, then a range of clusters is allocated.
It’s important to note that this does not account for files with multiple extents (i.e., fragmented files). In practice, since one will prepare a drive for this hiding purpose, there should not be any fragmented files, but it is useful to keep in mind that we’re ignoring this possibility with our parsing.
The start value for the range is the first value of the extent and the end value is the final value of the extent plus one. Both of these values are then converted to byte offsets by multiplying them by the drive’s block size.
Recall that we said we would require slack sectors to be of size 512 bytes. In the final part of this snippet, we calculate the location (in bytes) where the actual file data stops (start + size of file), and then calculate the number of slack sectors which can fit in this space (end — file end)// 512. We start from the end of the file and calculate the start of the slack space range by subtracting the number of 512 byte sized slack sectors from the byte offset of the end of the cluster. The byte offsets for both the start of slack space (with our 512 byte requirement) and cluster end are then appended to the list of ranges.
The next step is to convert these ranges into 512 byte slack sectors:
We simply loop through all located ranges in blocks of 512 bytes and save the starting byte offsets in the slack space location list. This yields a list of addresses (bytes from start of drive) where 512 byte slack space regions are located.
We now have the locations of file slack space sectors across the entire disk. Next we need to parse the slack table and determine which of these slack sectors are free and which are allocated.
We first load the slack table by opening the drive and seeking to the first slack sector location. Recall from our overview that we will always store the slack allocation table in the first slack space sector.
We then read each 17 byte SAT entry and convert each field to an integer. The ID ‘0’ is reserved for blank entries as empty entries on the slack table will be filled with zeros.
Once all of the entries have been read, we save them to the drive handler’s slack_table variable. We also create a dictionary of entries where the keys are ids and the values are lists of entry starts and ends. This will be used for conveniently looking up allocated regions.
Finally, we can divide the slack sectors into unallocated and allocated groups, and subdivide the unallocated space into contiguous chunks:
For each location in our list of all slack sectors (note that the slack table locations is not included) we check if the slack sector falls between any of the start and end points from entries in the slack allocation table. If a slack sector falls between a start and end, then based on our allocation scheme, it must be allocated to a hidden blob already in slack space. We therefore add that slack sector address to the list of used locations.
We then iterate over all slack sector addresses again and this time divide the unallocated slack sectors into free chunks (contiguous regions in slack space). We do this by checking if each address is in the list of used addresses, if it is not, the slack address is added to a list for the current run. When we reach an allocated address, the list is appended to the list of all regions and we start the aggregation process again when we reach the next unallocated address.
Finally, the allocated space locations and empty space chunk variable for the drive handler object are set with the used locations and empty regions we discovered.
The final part of the drive preparation process is a call to the ‘sum free space’ method.
This short method simply calculates the total free space in each free chunk, the total free space in all chunks and the total allocated space and sets the appropriate object variables.
At this point, the drive is ready for reading and writing data to slack space. In order to proceed to the reading and writing of data to slack space, we must first define how data will be loaded by our program and how it will be formatted when passed to the drive handler.
The File Handler Class:
We’ll next code a class for handling the files that a user wants to hide in slack space. We have two requirements for this process:
- The user should be able to point the program at a file or at a directory for ease of use. Since the number of entries in the SAT is low, a user should be able to store many files in a single entry. This means that whether the user wants to place a single file or multiple files in an entry, the resulting data passed to the drive handler should be a single unified blob.
- We should reduce the space needed for storing the files as much as possible. Since file slack space is highly constrained, we should compress the data blob to reduce the storage space required.
We can meet both of these requirements with the help of tar and gzip. Tar is a utility for combining multiple files into a single file in a recoverable manner. Gzip is a compression utility that compresses a file to reduce its required space. We can even invoke gzip directly from the tar utility [10]. We will therefore once again rely on Linux utilities within our python program. Let’s look at the code for the file handler:
This time, our init function simply sets the verbosity of the output. Our key functions are ‘make_blob’ and ‘extract_files from blob’.
We have two helper functions ‘set_files_to_hide’ which takes a file path (path to the file or directory to hide) and stores this in an object variable, and determine_target_type which leverages the os library [8] to determine if we are working with a file or directory. Currently the latter function is only used to provide debug output when verbose output is requested. In practice, we treat single files and directories in the same manner.
The make_blob function takes a path to a file or directory and uses subprocess [9] to create a tar file of the selected file/directory. We execute the command ‘tar -czvf blob_path target_file_path’. The ‘z’ flag passed to tar tells it to use gzip to compress the contents of the tar file generated from the target file/directory. Note that we save the resulting tar.gz file in a staging directory that is found in the program’s working directory. Once the blob file is created, we read all of its data into memory and return the resulting bytes object. We lastly delete the blob file from the staging directory, we wouldn’t want an unhidden copy of the data laying around!
The extract_files_from_blob function performs the same actions in reverse. In this case, the function is passed a blob (of type bytes) and saves this file into the staging directory. Subprocess is then invoked to untar and decompress the file. A ‘save_path’ must also be passed to the function which controls where the output file or directory is saved to.
Writing to Slack Space:
Now that we know how the data we will write to slack space will be formatted, we can return to the drive handler class and review the implementation of our read/write operations. Recall that we must call the prepare_hidden_storage function prior to reading and writing so at the time of any read/write operations we will have already loaded the SAT, and the lists of allocated and unallocated slack sectors.
Let’s start off by reviewing the simplest operation we can perform on slack space, a full purge of data:
This function is used to clear all data out of slack space, including the SAT. This returns the disk to its original state (assuming slack space was previously filled with zeros).
We open the drive in binary write mode and first seek to the location of the SAT. Recall that this will be the first slack sector. We then simply write 512 bytes of zeros to this location, overwriting the table.
Next we iterate through the list of all slack space locations (offsets from the start of the drive to the location of 512 bytes slack sectors). For each of these we seek to the byte offset and then also overwrite the slack sector with 512 bytes of zeros.
When this process is complete, all slack sectors will be filled with zeros, with any hidden data removed.
Next we will look at the process of saving a slack allocation table to the drive.
Recall that the slack table location on the drive is the address of the first slack sector. We create an empty bytes object and then for each entry in our slack table, we cast the integer values for start and end addresses into bytes. We also cast the id to a bytes object. All of these bytes representations are then concatenated and written to the slack_table_bytes variable. We then open the drive and seek to the location of the slack table. We overwrite the entire slack sector with zeros to clear the previous table and then write the slack_table_bytes to the slack table location, saving it to the drive.
At long last, it’s time to see how our main writing function works:
The initial code performs some basic feasibility checks to determine if writing is possible. We first check that the disk has been prepared. Afterwards we determine the size of the passed blob and attempt to find a free chunk (collection of contiguous free slack sectors) that is large enough to hold the blob. If we can locate such a chunk, we print some progress information and set our ‘chunk_locations’ (list of byte offsets to slack sectors). The final check is to ensure that the id is valid. To fit in a single byte, the id should be in the range 0 to 255, but recall that we reserve 0 for empty entries so the id must be in [1,255].
We next prepare two lists used during the write process, the location map and the data array. We iterate through values in the range 0 to the length of the blob in increments of 512. For each 512 byte block, we slice the associated data (in bytes) from the blob and append this to the data array. At the same time we pop an address from our list of slack sector addresses and append this to the location map. When this process is complete, the location map will contain a list of all addresses we will save to, and the data array will contain the data to be saved at each address in 512 byte chunks.
We next check that the slack table can fit another entry and if successful, we create a new entry using the first and last address from the location map and the id passed to the function. Our save_slack_table method is then invoked.
Lastly we open the drive in binary write mode, and iterate through both the location map and data array at the same time. For each iteration we seek to the location in the ith index of the location map and then write the data from the ith index of the data array. We flush the drive after each write and finally close the disk.
At this point, the blob has been hidden in aggregated slack space, and its location has been recorded in the SAT located in the first slack sector.
Reading from Slack Space:
Our tool wouldn’t be particularly useful if we could only write but not recover our data from slack space, so let’s look at that function as well:
Again recall that the disk is already prepared at this point. This function takes the id of the blob we would like to recover and returns a data array that can be written back to disk using the file handler object we discussed earlier.
We first load a list of all allocated slack locations (generated when we prepared the drive and read the SAT). We can use the slack table dictionary to read the start and end location associated with the id that was passed, telling us where to start and stop reading.
Using these two values, we slice the addresses associated with the blob from the allocated slack locations list. Note that we incremented the end index by 1 so that the slice will include the last address of the blob.
The next part is simple, we create a bytes object and then seek to each address in the sliced location map, where we read 512 bytes and concatenate them to the bytes object. After this we close the drive. The last section of code opens the drive for writing and once again seek to each location, but this time we overwrites each slack sector with 512 bytes of zeros. We then also remove the entry for the blob from the slack table and save the modified SAT.
An End to End Example:
Now that we’ve seen how each component of the drive handler and file handler object works, lets see the code that glues them together and manages command line arguments:
This code is simple and allows the user to either check the state of the slack space on drive, write to slack space, read from slack space or purge all data from slack space. We can see that for reading and writing operations we first prepare the disk and then make either a read or write call, using an associated file handler to deal with moving files between the disk and the data blob.
Now let’s see the entire process in action, from preparing the disk to read and writing values in slack space both manually and with our tool.
Using Our Tool To Hide Files in Slack Space:
Detecting our Tool:
So far we explored what file slack space is, how we can read and write to it and seen how our tool can aggregate this space to provide a managed means for clandestine file storage. Now it’s time to manage our expectations and understand why the tool isn’t a particularly good means of hiding data.
For starters, let’s recall that when we prepared the disk, we ensured that all sectors had zeros written to them. Let’s also note that our purge function, and the clear option of our write function also write zeros to slack space. In general, we should not expect a drive to be completely zeroed. In a normal usb drive for instance we might expect slack space to be filled with various bits and pieces from the files that previously occupied those clusters [2]. In fact, our zeroing of the drive in this manner creates an issue for us, it’s now very clear from a drive perspective that something is being stored in slack space! If we open the drive for reading in python and read through the data, we will find that the majority of the drive is zeros, except for files, and blocks of data that appear at the end of clusters. A forensic analyst would quickly catch on that something is being hidden in file slack. The uniform 512 byte size of our slack sectors would also reinforce this observation. Rather than random data strewn throughout the drive, blocks of 512 bytes are found located at the end of clusters allocated to files which do not contain that data.
We can remedy these issues with two solutions that complicate our design. First we can tackle the uniformity issue by relaxing our size restrictions (the slack sector size of 512 bytes). Instead of using 512 byte blocks, we can use all the space from the end of the file to the end of the cluster. This will require us to store not only the location of slack sectors, but also their size, which would require changes to the reading and writing operations. Now the analyst will not be presented with uniform blocks of data in file slack space, and we will no longer have a gap between some files and the slack sectors allocated to hidden files (recall the gap between the xaa file and our hidden message). However, it will still be clear that something is hidden on the drive in slack space as only select locations in slack space will contain anything other than zeros. To fix this issue we should change how we prepare the drive, rather than overwriting with zeros, we should overwrite with random numbers from /dev/random. Now when the analyst looks at the drive, all file slack space will be filled with some random data and our hidden data will not stand out like it did when the drive was zeroed.
Despite these improvements, further analysis will still show that data is hidden in file slack space. Our data blob for instance, is a gzip compressed tar file, and analysis of the headers present in the binary data in slack space will confirm this to a skilled analyst. They will then carve slack space [2] and despite the random data, likely be able to reconstruct our tar.gz file. There’s also the issue of the SAT living in the first slack sector. While this structure is useful for us, it essentially acts as a cheat sheet, letting an analyst know exactly what regions of the disk they should focus on. Our tool itself also presents a stark security risk, there’s nothing stopping our analyst from simply running the tool for themselves and extracting all of our hidden data without much hassle at all. Solving these issues requires a means to disguise our blob data and SAT within the random data that populates the disk. The clearest option is to encrypt both our blob and the SAT before they are written to disk. Then, even if our analyst is suspicious that some data is hidden in file slack space (and seeing that the entire disk is filled with random data is likely to raise some red flags), they will not be able to directly carve and recover it as they had done with the tar file. This solution however relies on the proper implementation of crypto and requires the user to memorize or store a key. As a result of the extra complexity this solution would bring, we leave them for future work.
Future Work:
Future work should first seek to implement the solutions for the issues described above. Beyond this, there are some additional changes that would provide benefit. Firstly, even writing random values to the drive for initialization is suspicious. An ideal solution might be to make the drive appear heavily used to mask our slack space allocations. To do this, we might overwrite the drive using files from the user’s computer, filling the drive with data that look like real files rather than random noise. This would, however, need to be measured against the encryption we perform on the blob data, as blobs of ‘random’ encrypted data might themselves become a focal point.
Throughout our tool we have relied on a number of Linux utilities to manage file system interaction. Of course, this has the obvious negative effect of destroying interoperability and adding additional requirements to the usage of our tool. The use of subprocess [9] also makes us reliant on a particular output format from these utilities, which should not generally be relied on. A more elegant solution might involve parsing the ext4 data structures ourselves rather than using external tools. For this work, convenience and speed of implementation were chosen over this more rigorous option.
Conclusion:
We hope you have enjoyed this article and found it informative and interesting. Please find the code located on GitHub (https://github.com/exembly/slack_hider). Please also feel free to report issues on GitHub or offer solutions of your own.
References:
1.What is Slack Space?. Computerhope.com, 2021. https://www.computerhope.com/jargon/s/slack-space.htm.
2.Salehi-Abari, A. INFR 4690: IT Forensics Lectures 5–8. 2021.
3.Salehi-Abari, A. INFR 4690: IT Forensics Lectures 1–2. 2021.
4.VeraCrypt — Hidden Volume. Veracrypt.eu, 2021. https://veracrypt.eu/en/docs/hidden-volume/.
5.Meskes, M. stat — Linux man page. Linux.die.net, 2010. https://linux.die.net/man/1/stat.
6.Ts’o, T. debugfs Linux man page. Linux.die.net. https://linux.die.net/man/8/debugfs.
7.Built-in Functions — open — Python 3.9.4 documentation. Docs.python.org, 2021. https://docs.python.org/3/library/functions.html#open.
8.os — Miscellaneous operating system interfaces — Python 3.9.4 documentation. Docs.python.org, 2021. https://docs.python.org/3/library/os.html.
9.subprocess — Subprocess management — Python 3.9.4 documentation. Docs.python.org, 2021. https://docs.python.org/3/library/subprocess.html.
10.Gilmore, J. and Fenlason., J. tar — Linux man page. Linux.die.net, 2010. https://linux.die.net/man/1/tar.