“I don’t know, I have no idea. Like with cloth or something? I don’t know how it works digitally at all. I know you want to make a point, I will just repeat what I have said: in order to be cooperative as possible, we have turned over the server … we turned over everything that was work-related. Every single thing.”
Email has been making headlines in the political news lately so I thought some political observers may be interested in learning more about how it works. I have been responsible for maintaining email servers for over a decade, or as my Congressman would say I am a nerd. Below is an extremely long and technical explanation about email and email servers. If you are just here for the politics you may want to skip to the last paragraph.
The Internet is a collection of computers that are able to communicate with each other. We refer to computers that make requests as clients, and computers that listen for requests as servers. If you use a web browser to open a web page your computer has requested that web page from a server. Just as you have a web browser to request a web page, web servers are running web server software to respond to that request. You can turn any computer into a web server by installing and running web server software. Once you have done installed and started the software people can request a web page from your computer by typing the appropriate Internet address in their web browser(assuming it's not blocked by firewalls, routers, etc).
When you use an email client such as Outlook, Thunderbird, Mail, etc to send an email, you type up the email in the client then when you press "send" the email client contacts the email server and requests that the email server accepts the email. The email server will look at several factors to determine if it should accept the email, is your computer on the corporate network, is your password correct, is it authorized to deliver email to the recipient. If any of those are correct it will usually accept the email. So if I was to send an email to firstname.lastname@example.org my server would check my username and password, then it would accept the email and look up the address for the server that accepts email for ldsdems.org(22.214.171.124), it would then contact the server and say I have an email for email@example.com the server would say "OK send it to me", and then my server would send a copy of the email to ldsdems.org email server(with computers EVERYTHING is a copy).
If someone sends an email to me at firstname.lastname@example.org my server looks at the address, recognizes the address as OK, accepts the copy, and then saves a copy to the hard drive. There are two standard ways of saving email to a hard drive(plus the Microsoft ways which I know nothing about), one way is to save everything in one big file. If you were to open it in a text editor(Notepad) you could scroll down and read through your entire inbox. The other standard way is to save each email as a separate file inside a specific folder and its' subfolders.
Now of course most people do not read their email with Notepad. They use some type of email client. An email client contacts the server and asks if their is any email stored there for you. Again there are two standard ways(plus the Microsoft way), that these requests for email are done. Post Office Protocol(POP) is the older way Internet Mail Application Protocol(IMAP) is the newer way. With POP all email is copied from the server to the client, and it is often deleted from the server as soon as it is copied off of the server. Back when many people checked their email using a dial up account this was very convenient because it allowed the user to retrieve all of their email and disconnect and read their email with out tying up the phone line. It also freed up hard drive space on the server, at time that hard drive space was more limited than it is now. With IMAP the client copies basic information such as subject, and to and from addresses when it initials retrieves the email. It often waits to retrieve the body of the email(and any attachments) until the user clicks on the subject in the client. The email is always kept on the server and can be retrieved by any client at any point. As most people have come to have dedicated Internet connection and multiple devices IMAP has become the more common method.
These two protocol came into the news during the Petraus investigation. There's a statute that was passed during the 1980's, when POP was the defacto protocol, declaring email that had not been retrieved and deleted from the server with in six months has been abandoned and can be reviewed by law enforcement with out a warrant. Now that most users use IMAP any email that is not deleted and emptied from the trash folder is left on the server. This means law enforcement has legal access to any of your emails that are older than six months.
I know many people use webmail instead of traditional email clients. It works the same way except the web server does all the thing the email client normally does(it acts as a web server and email client),. Sometimes the web server is the same computer as the mail server other times it's not. Webmail clients either use IMAP or something similar when communicating to the email server.
Now the important part of all of this is that email is saved on a hard drive on a server. I will explain a little bit about how that works. I will use magnetic hard drives for my example as that is still the most common type of hard drive in use. Magnetic hard drives consist of a set of plates and a little magnetic needle. If you were to pull one apart it would look like a little record player. Each platter consists of little tiny regions that can be set to magnetic south or magnetic north. Each letter in your email is saved across eight regions(Newer systems may use 128 regions in order to account for multiple languages). It works a lot like braille(braille uses six sectors instead of eight).
An early and simplistic file system call File Allocation Table(FAT) would simply keep a list of all the other files on a hard drive in a specific place on the hard drive. So when someone opened a file called coolstuff.txt the operating system would look at the list see which region the file started at and start reading the file until it got to the end. When someone deleted the file it would just delete the entry in the list. So with the proper software tool you could recreate the files entry in the list and open the file once again. At least until an other file happened to be save in the exact same spot.
So when an organization is done with a hard drive and want to dispose or transfer it what can they do to make sure they don't transfer unwanted data? Physical destruction is one option, but simply setting all of the regions to magnetic south(or north), is generally considered effective enough. This process can be referred to as a low level format, shredding, zeroing out, or wiping. I usually say zeroing out as feel it is the least ambiguous. Many tools randomly set the magnetization of the sectors rather than setting them all to the same magnetization. Some researchers have claimed to be able to detect the previous setting of the region with 80% accuracy, this is less useful than it seems because the success rate is bit based not character based (meaning a "recovered" document would not show 80% of the correct characters). Still many tools are designed to protect against this by randomly setting the magnetization multiple times. When I worked for a state agency I would use a tool called shred to randomly write to the entire drive twice, and then set all the sectors to zero. It was felt by those involved that this would create sufficient protection(as those drives may have had social security numbers).
There are software tools that will write random data to the part of a hard drive a file occupies before removing the file from the file list. There are also tools that will write random data to the parts of a hard drive that are not currently being used to store a file. These tool are becoming more commonly used and are sometimes set up to run automatically by system administrators, especially in environments where files may hold sensitive data.
Even if someone has not intentionally overwritten free space on the hard drive there is no guarantee that a deleted file can be recovered. The operating system will regularly reuse free space(files are usually deleted in order to free up space for more files, so this should be expected). The example of a file system I use above(FAT), is intentional simplistic. New file systems are more complicated and the recovery tools can be more difficult to create, and often do not exist. Deleted files not being recoverable does not mean the deleted files were intentional overwritten. The deleted files could have been overwritten by normal use of the system, or there may not be tools to recover deleted files from that particular type of file system.
A server usually has several hard drives. These hard drives can be set up to work together as a Redundant Array of Independent Disks(RAID) to improve speed and provide back up. There are seven standard RAID configuration which are often used together to create even more variations. When an email is saved to a hard drive using RAID part of it may saved to one hard drive and part of it to an other this is done for performance reason. Also data calculated by running the two parts of the email through an equation may be saved to a third drive in order to provide back up in case of the first two drives failed. I used three drives in my example as a simplified version as this can be done across a dozen or more drives. It's also common to set up a few drives to work together for certain sets of data and other drives to work together for other data. So you might install your operating system on one set of drives and have your email saved on an other set of drives. Any of these hard drives may be inside of the computer or in a separate case across the room or across the country(usually though they are in the same room for performance reasons).
RAID is generally only used on servers. It is not common on desktop on laptop computers. The IRS hard drive failures were desktop/laptops not server. Hard drive failures are very common which is one of the reasons RAID was developed. It is generally expected in corporate environments that anything important will be kept on the server not a desktops or laptops because the servers have RAID configurations, plus additional backups.
There are several types of backup systems that can be used for different purposes. Raid only protects against hard drive failure, it does not protect against virus, or accidentaly delete. An other type of backup system is called copy on write. This system is designed to not over write old file until explicitly told to. It tends to use up a lot of hard drive space so in generally only used for short term backup. System administrator will often set the system to allow old files to be over written after a day, or maybe even an hour. Copy on write is used to protect against accident deletion/overwrite. An other type of back up is synchronization, where an exact copy of the file system is made so that it can we switched over to immediately upon failure of the primary file system. In this case any files deleted on the primary system is also deleted on the secondary system. There are also archive backups. Where a copy of the system is made at specific points in time. These are often done so companies can delete old unused records from active systems, but still have access to the old files in case of legal review. I would expect government agencies(such as the IRS) to be required to have this type of system on everything including email. May be they weren't required to but will be required to now. Maybe it was configure in such a way that it missed emails under certain circumstances(deleted same day). Please leave a comment if you know why this didn't happen with the Lerner emails.
An increasingly common situation is called virtualization. A bunch of computers are installed with special software that allows them to work together as one computer. Then special software is installed on top of that which allows them to pretend to be a bunch of computers. This sounds redundant but is done for security and reliability reasons. My personal email "server" works exactly like this. It's a service I rent from company. There are many companies that offer these services sometimes charging by the minute. So if I was required to turn over the physical server I couldn't because it is spread across a bunch of different computers which are shared with other costumers of the hosting company. A copy of my virtual server can be made including the virtual file system, and if copied correctly some "deleted" data may be recovered. If copied in the most efficient manner however it could look like I zeroed out the free space.
Many companies sell software that will "wipe" your computer, they don't always provide clear definitions of what it does. So you can get very different definitions of the term depending on who you ask. So when someone says an entire server was wiped. *I* am not completely sure what that means. If they say all the hard drives were over written with random data five times, then I know what they mean.
If you randomly write data across all the hard drives on a server, the server will give an error message when you turn it on, it will say no boot device found. It's very obvious when someone does this to a server. If some one randomizes just the space that is not listed as being used by a file it is less obvious. The system will still boot and and can be used normally. Recovery tools will be unable to recover any deleted files, but that can happen for other reasons as well, such as no files have been deleted recently.
As you can see from the amount of writing above, email is complicated. Political opponents and well meaning journalists often don't take the time to research the technology in order to ask clear constructive questions. So while I empathize with those who would claim Hillary Clinton was simply being evasive. I believe her answer about not understanding how it works digitally was honest, and I hope journalist and her political opponents can be honest with themselves that they do not have any idea how it works either. If anyone knows where I can find reporting on the type of server, operating system, MTA, filesystem, backup system used in these situation please leave a link in the comments.