A Geek’s Guide to Digital Forensics


>>Today we’ve got Andrew Hoog from Via Forensics,
correct?>>HOOG: That’s correct.
>>All right. And he’s going to give us a brief chat on…
>>HOOG: Digital Forensics.>>Digital Forensics.
>>HOOG: All right, so. Well, I first of all want to thank Google for the invite today.
Most of the companies that we end up working with would like us to go out of business and
instead I met Fitz a little while ago and he said, “Hey, why don’t you come out and
give a talk about Digital Forensics?” And I said, “I’d love to do that.” I said, [INDISTINCT]
lot of people at Google that may not like some of the things that we’ve uncovered, some
of the chances, I–” “It doesn’t matter. Come on, we’d want to hear about it.” So, I want
to thank you guys for the chance to be here. We’re going to talk today about Digital Forensics.
And obviously with the crowd that’s here today, it’s going to be a very technical talk but
if you have any questions in the middle of it, go ahead and interrupt me. Let me know
if you got questions in the middle. Otherwise, we’ll cover some at the end. Via Forensics
has been around since about 2008. My background is a computer scientist. I’ve had various
management roles in different companies and then maybe about 2008, we started Via Forensics.
We’ve got a couple of books. They literally came out this week. One of them is on Android
Forensics, which is–which is my particular specialty. The other one’s on iPhone Forensics,
which I’m sure would be a big hit here. So, those just came out and we focus on both forensics
and data security. We’ve got a couple of patents pending in this phase. And I also do quite
a bit of expert witness works, so in the forensics phase, you actually have to have certifications
and then you can be an expert in state and federal courts. I’m also [INDISTINCT] like
most of you, I’ve been using [INDISTINCT] for quite some time. And I remember the first
time I ever learned [INDISTINCT] special edition came out with an 800-page book and I literally
opened it up, started at page 1 and went through the whole thing. I was hooked from there and
haven’t looked back since. We did quite a bit of work in a mobile space. I’ll just tell
you a quick story. This gentleman approaches and needed his phone examined and so we said,
you know, “Okay, but what kind of phone is it? Is it Android device?” you know, those
guys are rolling out 400,000 devices a day. He said, “No, it’s not an Android device.”
We said,” how about IOS? Is it an iPhone? You got 200 million iPhones out there.” He’s
like, “Oh no, no, no.” He’s like, “I don’t have a smart phone, I have a Blackberry.”
So, we do quite a bit of work in the mobile space. And again, iPhone and Android is really
where we spend quite a bit of time. Today’s talk is really meant to be an overview of
Digital Forensics. It’s going to be a quick run through. We’re going to skip over some
of the detail and boring stuff. We’re going to jump right into examples and give you–if
you’re interested in tinkering in the space, it’ll give you some things that you can go
back and install in your workstations start playing around with it. But briefly, Digital
Forensics is a science, it’s recognized as a science, which so I can be an expert in
federal court. And we’re interested in the preservation and analysis and reporting of
digital artifacts. So that would typically would cover computers, laptops, obviously
things like thumb drives, USB storage. Mobile phones have become a very, very big deal.
That’s why we chose to specialize electronic documents that are used in court cases. What
we’ll talk about near the very end is that forensics is typically a reactive science,
so we get called inn when there’s been a problem, when there’s been a civil law suit or criminal
case, an intrusion, incident response. The big thing that we’re interested in is kind
of a [INDISTINCT] of the company is, we do all the forensics cases and that’s kind of
fun and interesting, you learn a bunch of stuff. What gets really exciting is when you
move forensics out of the reactive and move it into the proactive space. And so near the
end, we’ll cover a couple of topics in mobile app security and in enterprise security that
kind of contain outside the typical forensics spots of being a reactive after the–after
the scene and then come in and do things proactively. So real quickly, the three types of storage
devices that we typically deal with, the traditional hard drive spinning magnetic media, that’s
pretty simple. We could physically disconnect these things, hook up right blockers and deal
with them. The solid-state drives that come out–every time the new technology comes out,
there’s a bunch of [INDISTINCT] in the industry that say, “This is going to change. We’ve
got to recover any data. Its forensics is over,” and that never happens. So obviously,
we deal with solid-state drives and they have their own kind of host of issues and challenges
that they come with. Where we really play in quite a bit, it’s similar to the solid-state
drives but basically in the raw NAND Flash memory. And so this is the type of memory
that you’ll find on a smart phone, on a USB, thumb drive, other types of portable devices.
They’re obviously not easy to remove. We have some techniques where you can hook up J-type
clips and take open the CPU and basically pull data out in a debug mode. You can either
chip off where you basically take out your plastic chip off of the PCB, put it into a
special reader and you can pull the data off that way. But for the most part, you’re not
pulling data off of these devices with the physical technique, so you have to come out
with other ways to image them. The other big thing about NAND Flash and the reason why
I spend a lot of time on this, is that NAND Flash has really changed the forensics and–I’m
sorry, the security space. So, there’s many characteristics, average NAND Flash can only
have about 10,000 writes. It wont sustain a charge after that. And for that reason,
the Android team here at Google and then folks at Apple and a number of other companies developed
or chose special file systems that are optimized for NAND storage. And we’ll come into that
in a little while when we talk about types of data that you can recover off of that.
The other thing too is that in the forensics space, we’re very interested in preserving
a piece of evidence and proving to the court or whomever may be that we have the exact
copy. In traditional hard drives, that was very simple for us to do. We pulled a plug,
hook up a write locker, and if you don’t plug that thing back in and you image it ten times,
we’ll always be able to verify it and we have a bit for bit copy. NAND Flash memory, thumb
drives, solid-state drives, it’s impossible to do. The reason is is that it’s a lot of
drive management and things going on behind scenes and then prevents you from getting
an exact copy. So, we’ll talk about some strategies to get around that but NAND Flash memory is
the other big type of storage that we deal with considerably. So, in the forensics space,
we have to talk about how we’re going to apply our data. There’s three primary things that
we do. And the most simple approach is essentially doing backup files. We’ll get these in court
cases that are involving a discovery, but we don’t have to come in and look for deleted
data. They really just need to get a bunch of files out, take a look at them and then–and
then analyze it. We also do this quite a bit on iPhones where we’ll do a backup of an iPhone
and then we’ll basically logically analyze the backup files that came off. You’ll see
it in email files, word docs, all different types of documents. It’s the least forensicly
sound, it’s the most uninteresting from a technical perspective and it’s probably the
largest thing that people do because most cases simply don’t warn the other techniques.
A second approach and kind of emphasizing this in the mobile space is the logical acquisition.
So, on an iPhone or an Android device, we can pull out data through content providers,
we can pull out data from the Apple backup protocol. But what if we can get into the
phone and basically do a tar gz of the entire file system. Now, I’m not going to get deleted
data, but I can get anything in slash mobile or slash data and everything underneath that.
So, there’s a type of acquisition that we do, and you could do these on Windows computers
too. We are not pulling out all the unallocated space but you are going in there and preserving
the date, time stamps and everything of that sort on the actual file system. So we consider
that a logical acquisition of the device. And then kind of the gold standard of what
we really strive for at forensic space as a physical acquisition. Physical acquisition
is a bit for bit copy of the storage medium when we did our acquisition. In a traditional
hard drive space, we can repeat and verify that. But again, in a NAND Flash memory and
solid-state drives, we’ll be getting a point [INDISTINCT] copy. The device, whenever it’s
powered on, even if it’s connected to a write blocker will always be changing behind the
scenes. The nice thing about a physical acquisition is that we can easily recover deleted data
out of those. And there are some specialized tools, some specialized software, hard drive
that we can use in order to do the physical acquisition. But in terms of just time tinkering,
anybody could hook up a drive, do a physical acquisition with basically a–say, at a USB
adapter, you hook it up to a Linux box. We’ll go through some of the tools that you can
use to do that. So physical acquisition is really the kind of the gold standard of what
we’re looking for. Now, how we do the–in–the verification of the data that we have an exact
copy? Well, it’s very simple, we simply do hash values. It’s accepted in court, everybody
eventually knows how they work. For those that may not be familiar, it’s just a hex
value that’s calculated with some sort of import of data. The nice thing about the hash
value is of course is that, a single byte change in your source data will have an avalanche
effect and will have a radically different hash value. So, we use hash values, they’re
admissible in court and allows us to say the [INDISTINCT] that are identical, I have an
exact copy of the original. I did all my investigation on the copy. Now, we really don’t–we don’t
have to reproduce–so we don’t have produce that physical media every single time we do
referencing. And again, this is the challenge in the NAND Flash. In the mobile space is
that we just can’t get a hash signature to stay the same–image that same device [INDISTINCT]
times. So you kind of do a point in time hash signature and basically say this is what it
was, data hasn’t changed since then and we’re going to operate off of–off of that data
set. Two common ones, md5 is what most people use. The forensics folks are starting with
the sha256 because there’s a possibility of some collisions now that the number of files
has increased. And again, for anybody that hasn’t seen like an md5, if you took my name
and ran it through md5, here’s the hex signature at the bottom of this slide that you would
get our for that particular data set. We do this on entire drives. Yes?
>>Sorry, so–I apologize for [INDISTINCT] my question, so, I know that if you got [INDISTINCT]
that there’s something we’re allowed to [INDISTINCT] like move things around it that’s additional
writes that would explain why you’re not getting any [INDISTINCT] Hash?
>>Hoog: Exactly.>>But if you turn off writes to the thing,
you should be able to gather a few Hash on the flash as well?
>>Hoog: You–the problem is, is behind the scenes, even if you’re not writing, the disc
is still managing its base actively. And so there are–there’s ware leveling, and it’s
very difficult for forensics folks to come in because most of the information and in
that topic is intellectual property. So when we grab a solid-state drive from Toshiba or
Intel, that…>>They don’t tell you what they’re doing?
>>Hoog: They don’t tell you what they’re doing. That the ware leveling, the bad block
management, the re-manipulating and moving data around to optimize it, all that happens
behind the scenes, we’re not aware of it. Now, we’ll talk about how Android’s a little
bit different. We have some more access in the Android space. It’s still problematic
and you don’t even have to write anything. You can literally hook up a write blocker,
nothing is being written and it’ll still come out with a different hash value.
>>Okay.>>Hoog: So let’s talk for a minute about
how to acquire a hash forensic image. So just conceptually, if possible, if you’re dealing
with a solid-state drive that you pull out or if you’re dealing with a traditional hard
drive, you hook it up to a physical write blocker. These are little black boxes you
can buy in from [INDISTINCT] and a number of other companies. And that physically prevents
any writes from ever going back to the drive, it essentially intercepts them and then doesn’t
pass them through. There are software techniques you can use, Linux, you can flip some flags,
Windows has got a USB driver. If you’re really good at something for a cord or maybe use
somewhere else, you know, don’t put us off around, that’s not why [INDISTINCT] in a–on
a write blocker–physical write blocker. And again, this is essentially impossible to do
in the NAND Flash space unless you do a physical chip removal and put it on a chip reader where
you’re stripped of any, you know, essentially all of that flash translation layer and things
of that sort. Then you physically acquire this–the–this–the device with software.
We don’t do a lot of commercial stuff, in fact, I don’t think we do any commercial tools
in house with this. So we primarily focus on involvement source and the presentation
will just give you examples all on open source. There’s a whole bunch of different tools out
there. They’re maintained sometimes by different federal agencies, by different forensics companies.
There’s a couple of examples up here. The Department of Defense “DC3DD” is the one that
we use the most, the mobile [INDISTINCT] example. There’s also some free tools out there. For
instance, FTK puts on an imager. FTK is a commercial–forensics company, but they have
a free imager out there that you can use to apply your–an image. So you can download
that and run a command line or do a hacking widows and I think a couple of other environments.
And then there’s the full-blown commercial tools that will also do this. So, a lot of
forensic shops go down to commercial, they kind of drink that cool-aid and they go do
all of their acquisition and analysis in a particular commercial tool. After you do the
forensic analysis, you then want to do the verification where you essentially reread
the source device and you compare the hash signature and make sure that you have that
identical copy. So here’s an example, if you guys want to refer back to this, we’re going
to post this out on viaforesics.com, our web site. Anybody can take a look at it and I
know the Google folks are going to put this up on youtube. But the Department of Defense
has a cyber crime center, they have invested interest in making sure there’s validated
software that works and allows them to do their job and so they put that out there as
open source. It’s a patched version of DD that you’ve seen on many unit systems. But
they do get a number of features that are helpful in the forensic space. It’s–I put
an example up here where essentially you hasten the DC3DD command, you give it your source
device, which would be typically be /.sta or std, whatever [INDISTINCT] device that
have been assigned by the operating system. We always put the verb “of”. You put “of=”
so this the output file that you’re going to write into. So you give it some name like
driver01EDD, turn on verbose, do a hash signature on the fly, track that in a log file, and
the very last thing is rec=off. And that basically determines how you handle when the drive has
errors. So we [INDISTINCT] two drives yesterday, it was going to be a one-day turn around,
we were going to get it back and overnight of–to that same day, would let you know if
both drives were throwing out read errors. So we’re unable to acquire the drive and get
a hash signature. So this tells the program what to do when you encounter an error. And
so we basically say “if you see an error, do you want to keep going with the recovery
or do you want to stop and then go figure out what to do next?” I have this from source,
if you’re going to use this on a workstation, it takes like 10 seconds. You can just download
it from sourecforge. And I just gave an example here. On the second line, you can see write
protect is on. This was a 500k power drive. When you do a physical write blocker and you
connect it up, if you look into system messages or d message, our link spot or pots or whatever,
you’ll actually see write protect is on so that the operating system has detected that
it’s unable to write through the device. And so this is the type of logs that we capture
to show the process that we use. About 10% percent of our cases involve failing hard
drives, so what you with those, if you don’t want to throw in the towel and say you can’t
get anything off this device. So you have that little flag that says, “Hey, what do
you want to do?” You either stop when you have an error and then decide “Hey, I’m going
to go down a totally different path and try to figure out what I’m going to do.” A lot
of times people will continue on air and simply pad the sectors that you can’t read and pad
them with notes. That way you maintain the same size of the DD image as the actual hard
drive, you rip past the bad locks and you pad it with zero’s and then you decide what
do you do, do you go back and explain that later or how do you want to deal with it.
The other option that exists is that you skip the bad locks and that’s probably a really
bad idea. If the image of 500 gig hard drive and 499 gigs come through, you’ve got a problem
explaining in why are these differences. So this an example here, what you’ll see or what
you don’t want to see, we’re imaging a hard drive that’s connected to STE, we start getting
abort commands, we can’t sense the information and that it says “Hey, I can’t read this sector.
I’ve got to find out I’m buffering or I can’t do anything.” The trick that we found in its
great software is–again, this is an open source under the [INDISTINCT] project, is
Ddrescue. It’s an extremely powerful program, you go out there, you compile it and what
it does is it begins to read the drive as fast as it can. As soon as it starts hitting
bad blocks, it essentially skips over them and it keeps going. And the idea is, if your
hard drive is going to fail, let’s rip every piece of data we can off of it as quick as
possible. As soon as we get to the end of the drive, it maintains a list of which blocks
are bad, and then it goes back and it takes the size of the sector that it reads and it
makes it smaller and smaller and smaller. And it takes a long time, but we typically
get very, very good recovery by skipping over the bad blocks. It has the ability to read
things backwards, to read them direct or indirect, it has a whole bunch of different options.
So if you ever have a hard drive for a family member which is a–I’m sure you guys have
gotten these requests before. The hard drive is partially failed, all is not lost. Take
a look at Ddrescue, it’s a great program and again you can just download that and compile
it. So, I want to spend just quick overview on what does a typical forensic investigation
look like. And I say typical because there really isn’t a typical one, but there are
a number of steps that you ought to consider. We believe very, very strongly in building
a timeline of events, it’s the first thing that we want to do when we get a computer.
We want to figure out the entire, what we call MACD, the Modified Access Changed And
Created Timestamps on the entire file system. We want to rip through the metadata inside
the files themselves and we want to build an entire timeline, anything that happened
on that device. And now, we can zero in, and say this file was modified at this time, we
saw registry change here, and somebody connecting a USB drive, and we could find out everything
that is happening. So the first thing that we do is we create a timeline and I’m going
to step you through these in some examples. You don’t have to mount the dd image to do
that, so we’ve got some special software–open source software that will allow you to do
it. We then mount the dd image as a read-only so working–we’re only working on the copy
but we mount that read-only. We list off every file that’s on there and every file that the
file system is aware that’s deleted, so we list off every single file deleted and not
deleted. And then we begin to analyze key files. If you’re in the window space, you’re
going to be looking at registry files, link files, user profile, web history, whatever
it happens to be. If you’re running an [INDISTINCT] or Linux, you’re going to look at the past
history, you’re going to look at the recently run programs, you’re going to look the g–vf–vfs
metadata about the file systems that have been connected. You’re going to try to basically
piece together all of the information on the system. At that point, we typically remove–move
into recovery deleted files that may be–that may be important to the case. Now, deleted
files are typically still referenced inside the database, if you will, of the file system.
So, in an NTFS Files System, you have the Master File Table, the MFT or you’ve got the
file system that has their own back table. It is essentially a list of all the different
files that’s a where of, were there inodes, where they point. And when you delete a file,
the entries can still exist in that master database. So, we’ll go into the NTFS database,
the MFT and we’ll parse out and then decide have we recovered to be somebody’s deleted
files. Here, I’ll show you an example of that in a minute. If you’re unable to recover them,
there’s still the possibility that the data exists in what we call unallocated space,
space that was perhaps allocated at some point. The operating system says I am not using this
anymore but maybe there are files or file fragments in there. So, we use a technique
called file carving. We’ll go in and see if we can extract out or carve out unallocated
files that are unallocated. Then we may do something like a full index search of the
dd image, and a full index search of all of the logical files so that we can come and
search and look for keywords and things of that sort. And then from there it really goes
in a million directions. People hide sensitive data in other files. That’s called stenography.
You can go in and try to figure these things out. So, there’s a million different specialties
that have been in the forensic space. But these first six or seven steps are really
what you’re going to do in many, many investigations to get that start. So, this kind of part of
the talk, we’re going to go into specific examples. These are all open source tools
that again, you can download and install. There’s an excellent tool, we use it all the
time. It’s called the Sleuth Kit. It’s written by Brian Carrier. He still actively maintains
it. He also wrote a fantastic book called–we call it FSFA. It’s File System Forensic Analysis.
It’s 400 pages of everything you wanted or didn’t want to know about file systems. And
if you have insomnia and, with all respect to Brian, pick up that book late in the evening,
you’ll be set or pick up our book, it’s pretty much the same thing. But what you need to
go in there and understand why did Microsoft update this time in milliseconds, this one
in an hour and then here it’s every two seconds. Brian’s got all the details in his book. It’s
kind of the bible for forensics people when it comes to file systems. So, he’s got the
book and then he has the Sleuth Kit that’s out there. You can install that with different
forensic packages and whatnot. But again, if you’re going to be playing around with
this, just download it from source. It’s very, very simple to compile. He pushed down an
update two or three days ago. It supports a lot of file systems that you may run into
NTFS, FAT, different Linux file systems, CD-ROMS and it’s just sitting out there. it’s sleuthkit.org.
So we’re going to spend a few slides going through in some examples. One of the first
programs you’ll find out there is called mmls, Media Management ls, if you will. And that
basically gives you partition info. So, if you look at the screen here, you can see that
I’m doing an mmls on .spv. So that’s on a physically connected disk. You can just as
well do these on dd images. And you can see each one of the different pieces of the file
system. It’s probably quite obvious here that we’re dealing with a Linux file system. You
can see that–and this is very, very common. The DOS partition cable is typically 63 bytes
long. The first byte tells us everything that we need to know about the file system and
then you got 62 bytes that are essentially unused, unallocated. So, that’s why you see
the primary allocation table and then unallocated. And then you can see that we’ve got a Linux
partition EXT3. And you go down the table and you can see all the different data. So,
this would tell us our physical device or dd image, what does a file system look like
and where should we be looking for data? In a lot of cases, we’re going to jump right
into the EXT3 or the NTFS. If we’ve got somebody that’s very good technically, we might start
looking on unallocated space. And say, “You know what? Somebody could hide data physically
on the drive and move it into an unallocated partition.” That’s easy enough for us than
to see here to tap and kind of focus the investigation. So, mmls will give you that background information.
The next thing that you can do is you can run a program in the Sleuth kit called fsstat,
File System statistics. It will give you a lot of information. On this particular one,
you see I switched to a different file. This time I’m looking at a dd image on a WebOS
taken from a palm tree. So, this is a dd where we went out. We got a physical image of a–of
a palm tree, it was running WebOS. We’ve done a secure race on it. We wanted to see how
effective is secure race on WebOS. And so buy doing fsstat, you can see a lot of these
didn’t fit on the screen. So, I just kind of cut it off after the first couple of lines.
But you can see information about the file system. What file system? What was the volume
ID? When it was last written or updated, mounted? A whole bunch of information and it gets into
all the metadata and then will individually list out each of the files and the inodes,
what files they’re connected to, and essentially allow you to reverse the entry and recover
information. So, that’s fsstat. Now, the one that’s really interesting and we spent–we
use quite often is something called forensics list or forensic ls. This utility where we
can come in and we can clear the things like the Master File Table, the MFT, that’s part
of NTFS. And we can say rip through that whole database and tell me everything that you see
on the system. You can provide different offsets. So, you can take a dd image and you could
just examine that third partition or that fourth partition. So, fls will basically rip
through and pull out everything about the allocated–about the file system that it can
find. Again, I refer back to this MACB. This is going to give us any time that a file is
modified, accessed, changed or deleted. And we use this to build our timeline analysis.
And here’s an example of a command on running fls, putting in the essential time, otherwise
it will be in GMT. We track what the skew is in terms of the real time versus what the
BIOS are supporting. So, we do investigation, we boot up the computer, we look at the atomic
flaps that we have, we look at the BIOS, we figure out there’s a three-second skew. And
that’s probably important in a–like in a–incident response. But we have to come in and try to
decide whether or not something happened three seconds ago and it matters if we’re matching
up loss. So you can tell what the skew offset is, you give it a label, a file system, some
offsets and then you basically point it at the file. So in this particular example, we’re
actually looking at down here, at the command. I did it against a NTFS File System. And you
can begin to see–and it’s difficult to read. The next slide will address that. The different
files, if you’d noticed there is–about halfway down, there’s one that says $mft and then
$mftmirror. Those are the two NTFS databases that track your entire file system. It actually
stewards a primary MFT and then it mirrors the MFT. So, if somebody tries to wipe out
your entire system, we have the ability to protect you. You can come back and grab the
mirror MFT and essentially recover what have been mirrored by the operating system. So,
by looking at it from a forensics perspective, we’re actually looking at the dollar sign
special files that you don’t have access to with the normal operating system. Then we
can then parse that information out. Now, looking at it in this format is a little challenging.
So, the file that you created is called a body file. It’s just the terminology that
the forensics community came up with. And so what you–what you do is you then take
a program called mactime. And you point it at a body file and you say, “Hey, I need to
make this human readable. So, give me something that’s better to use.” And so forensics, what
we’ll typically do is we’ll put it onto CSV. And then we can hand this off to attorneys.
We can go in and fill a dirt and say, “Hey, show me everything that was modified at this
time. Show me anything that was deleted or whatnot.” In this particular example–I’m
going to go back on page–well, in this particular example, maybe we can cover it later, you
can actually see the files that have been deleted and the files that have been deleted
and reallocated. If they’d been reallocated, that’s basically been reused by another file
but we don’t have it fully recovered. If it simply shows up as deleted, we have tools
that will then jump in and recover that deleted information off of the drive of the dd image.
So, once you’ve got the dd image, you’ve got a forensics copy, you’ve got your hash signature,
now, what you want to do? You need to mount that dd image. You need to be able to open
that single file up and do stuff on it.>>I just want–what circumstances, there’s
a record of the file actually saying on the disk either as just deleted or it’s reallocated?
>>Hoog: So, if somebody deletes it–and I actually had a different example and I think
I changed it out at the last minute. But if somebody comes in and deletes a file in NTFS
File System–and we’ll talk about the Apps2 next, which is a large structured file system.
It’s totally different in how they handle it. But essentially, that record stays in
the MFT database until it gets reallocated, until the file system says, “Hey, I need to
reuse that space.” And so, what we end up having is the file system marks it as deleted
but it’s still sitting there. It’s still allocated on the disk. It’s still referenced as deleted
but it’s never shown up in the actual file system. So, when you come in with like fls,
you’ll find tons of references of deleted files that are sitting there and recoverable.
Now, there’s another case where it’s still sitting in the MFT but some of the sectors
on the disk that were rising that file get reallocated for another file. And then we
have a situation where we’re aware that the file existed but part or all of it has been
reused on the–on the disk. So then we get a status fact of deleted/reallocated.
>>But for most NTFS systems, files that you deleted years ago still show up?
>>HOOG: It’s kind of a mix. The file–the system level files tend to get reallocated
quite a bit but we find a lot of users based files that we do end up recovering and it
depends. If somebody had a 250 gig hard drive, they only use 30 gigs, let’s say they came
in or deleted all their files or went into internet explorer and tried to do a clear
cash because, you know, they wanted to hide what they were doing, we’ll essentially recover
all of that. Now, it was five years ago, it kind of depends…
>>You also have a record of files…>>HOOG: A lot of times you have to record
unless the MFT itself doesn clean up and basically, you know, completely gets rid of that. But
in general, we see all that information. So, in this particular example, we need to mount
the dd image. So we come in with mmls and we take a look at the dd image that we have
out there. Just like you saw with the physical disk, you can see the partition table, this
actually was pulled out of an Android device and was the SD cards. We pulled out the SD
card, we imaged it and we can see that unlike most hard drives, we actually had the primary
partition table in the first, of basically one sector, and then we have a hundred and
twenty-nine bytes that are unallocated and a hundred and twenty-eight and then a hundred
twenty-nine byte the [INDISTINCT] the FAT16 File System. So, we basically use that information,
that 129 could then go out and mount the file system. So, we go out and we create a directory
and with pseudoaccess, you basically say, “Hey, I want you to mount the vfat file system.
I want you mount it on a loop back.” So we’re setting loop back device because we don’t
have a physical device that we’re using, mounted read-only and here’s my offset. The offset
number is basically the start of the FAT16 times size of the sectors, so 129 times 512
will lead you out to 66048. So, it tells Mount to seek out to that part of the file, the
Mounts is read-only as a FAT File System and then here’s my dd file and where to mount
it. If you then go out and take a look at the–at your mount tables, you’ll see that
on /dev/loop0, we have the VFAT File System and then you can see that the VFAT File System
here at the very bottom is got 1. 9 gigs, 244 megs are used. So, you can basically mount
that dd and enjoy your work station. At that point in time you jump and you can do any
analysis that you want because if you’re working on a read-only copy of the original source
media. So, couple more slides and I want to just kind of give you some ideas. There’s
a gentleman I’ve been speaking with for probably over a year now, Kristinn Gudjonsson, he’s
out at Iceland, he developed Log2timeline, which I slightly misspelled and I have it
here. But a lot of the timeline was Kristinn’s attempt to basically say there’s a lot of
valuable metadata in individual files sitting in registry files. I can pull out timing info
for registry files from the vent blocks, from the MFT, prefetch, browser history, flash
cookies done by the flash [INDISTINCT] so, he’s got 46 different file types that he can
extract timeline data out. And so, if you download his software and essentially compile
that, he’ll export it onto ten different formats, just sitting out there at log2timeline. It’s
great software and so what we do is just we run a piece of his software, it’s called Timescanner,
so we basically tell time scanner to go in, to look at the mount SD card directory that
we mounted the file system at, to put everything in Central Time Zone and to rip off any piece
of forensic metadata, file–timeline metadata it can of all the files that it finds. And
so it’ll find dll’s, what time the dll’s were created, what sort of cookies are found, any
kind of information and it will put that into a body file. We take that same body file that
the Sleuth Kit helped us fill, put those two things together and then we run a [INDISTINCT]
against that and create a–basically called a super timeline. So, we’ve got every piece
of information we could want, whether we could positively track out that device and now we’ve
got a timeline. Couple of other tools to mention, Harlan Carvey, he’s also published by a few
books [INDISTINCT] he focuses really only in the Windows space and he developed a tool
of RegRipper. A lot of people use this together in peril–I tried to convince Kristinn to
move to–Kristinn to move to hightime and I think he’s considering. Harlan does all
his stuff in Pearl and he actually wrote it for the Windows platform but there is a Linux
[INDISTINCT] which is the one that we use. And the goal or RegRipper is essentially to
parse out the Windows registry files, pull out every piece of information it can possibly
get out of the registry. And it’s pretty amazing what you can find there in the registry. So,
you can go out the regripper.wordpress.com and essentially download that tool, compile
it and you can specify the registry file and then what sort of data you want to extract
from it. This is open source software and it will rip out forensic data out of Windows
system. The last tool that I want to talk about is Scalpel. Scalpel is a file carving
utility. Again, this is open source for years and years of sitting, I’ve [INDISTINCT] and
about a month ago, they released a [INDISTINCT] version. So you can go out to the website,
download Scalpel, you can compile it and download it and they don’t actually have a make installed
and you can basically copy that and use a local bin. And what you do with Scalpel is
that many files have essential–a magic number at the very top and you can identify a single
live file, you can identify a JPEG. And so what Scalpel does is it rips through the dd
image and it says, “Hey, I’m looking for any of these known file headers.” They specify
a bunch of them ahead of time for you. You can basically put your own ones in there.
In anytime it finds it, it then will parse through the system and look for the footer.
If it has a defined footer, it will go through and do 10k or 800k or whatever you tell it
to do. It’ll look for a separate type of identifier and then reverse and go backwards, so some
PDF files, you need to find the start, find the bottom marker and then go back up a couple
spots and so it will find the last one. So, there’s a lot of functionality built in the
Scalpel that will allow you to carve files out at the file system. So, there’s kind of
a standard scalpel.com tha comes with it. We developed our own Scalpel configuration
possible in Android and iphone because they are different types of file system and we’re
pulling out different information. All of that goes in the Scalpel output directly and
then you can go in and see all the recovered files. So, I’m going to shift gears here and
I want to talk a little bit abound the Android space. The Android obviously uses NAND Flash
Memory. This is, again, we have a specialty in this. We got our books around, we’ve got
some commercial software. Unlike iphone and other platforms, the Android folks decided
to not have NAND Flash Memory where the manufacturer had to use a certain one. So, it allows them
to use any NAND Flash that they want. And they provide this layer that sits between
the developer and the NAND Flash called the Flash Translation Layer, that basically exposes
the flashes that block the [INDISTINCT] so, that is implemented in software in the Android
space and the Flash Translation handles the ware leveling, bad block management, some
of the stuff we were talking about earlier. In Android and in Linux, the Flash Translation
ware that most people use is called MTD, Memory Technology Devices. Again, it’s another open
source device. The newer Android devices, Samsung started doing this fist. They’re actually
beginning to move away from the MTD and they’re coming out with their own NAND Flash chips
that have the Flash Translation that are built in the [INDISTINCT] it’s already baked in
so we don’t have the same kind of access we have in the earlier Android devices. But–so
those are built in the [INDISTINCT] but on a lot of the other phones, we still have the
/dev/mtd devices were we can do our physical imaging. Now, MTD divides the memory essentially
into different blocks. The set up is a little different in educational hard drive, you’re
normally looking at 128k block and there is a 64 bytes of Out-of-Band data that store
inside the block for each junk or each particular cage. And inside that is where–for instance,
you have two storage of bunch of metadata bad block, error correction code and things
of that sort. So, this is kind of what it looks like in the Android space. So, if you
have an Android device using the MTD, Memory Technology Devices, that access your NAND
memory, you basically have 132k as your block site. In case you have 64 two kilobyte chunks
and after each one of these 2k chunks you have 64 bytes of Out-of-Band data. The great
thing about doing forensics on Android with MTD is that when you’re able to get your hands
on the OOB data, you can do a lot more with the devices because we’re actually seeing
how the NAND Flash is being managed by the Flash Translation layer. So, we can see where
the bad blocks have been marked. What is the ware leveling technique? Can we reassemble
the blocks back in [INDISTINCT] allocation even though they’re scattered all out over
the physical image? So that is a big change for us and something that in the Android space,
we’re now able to do and this is kind of what it looks like. AS we talked about earlier,
there’s a couple of different forensic techniques that you use. You’ve got your physical techniques,
but first thing that you start out with is a logical recovering. Most cases it’s sufficient
to start there, it’s the least complex. In the Android space, you do a logical recovery
using content providers. It adds an interface that the Android team built in to allow apps
to share data. So we essentially come in, we say, “Hey, we want to share some of the–take
some of that information that’s being shared.” We have a free tool that we developed. We
have a free tool that we developed. We give it away to law enforcement and to different
government agencies. It’s called AFLogical and it basically goes out and it takes the
content providers, it reads that information logically so it won’t get any deleted data.
And then it stores it and analyzes it. We [INDISTINCT] about 10 days ago to release
the commercial tool based on the AFLogical that takes all of the manual stuff that had
to be done, does different analysis on it and puts it into a virtual machine and makes
it point that–kind of easy. So, logical recoveries are the primary thing that you could on Android
devices. But we’re interested in moving beyond those content providers, those–the CPros
because we’re only getting the information that the Android developer chose to share
with us. So, we can pull an SMS, we can pull out–right now, we pull out about 40–40,
45 content providers, we’re working on a new version that may pull out a couple hundred,
but it’s still a limited amount of data. So, to get beyond the content providers, you basically
need to escalate privileges, you need to get some sort of assets to the device. Now, if
you had the original group of Dev Phones, you just had–as your access that was great,
no problem. This talk is not about how to–how to get [INDISTINCT] on Android. If you want
to do that, you know, you can Google Dev Phone, go out to XDA, go buy our book. So we’re not
really going to cover how to do that, but basically if you escalate privileges on the
device, you can then take the next step forward which says, “All right, I want to tar gz up
for the entire file system. It’s not the same as having unallocated, but it is going to
get us everything under /data/data. And if you’re in Android space, if you can get that
to record, you’ve got a lot of what you need. So, that would be all of the sequel, like
databases, preferences, files, pictures, images that APP developers are storing inside their
protected space that they could–when they spin off any [INDISTINCT] so we’ll–if we
can escalate privileges, then we’ll go for a logical acquisition. You could push [INDISTINCT]
up to the phone as long as you recompile it. For the ARM platform, you could tar gz it
and send it out like over NetCap or you can just use something like an adadad data as
a recursive hole. We have some issues when you do larger cursive holes that you could
run into some issues. So, in general, if we’re doing it for a case, we’ll do a tar gz and
send it out over NetCap. But the real goal in the forensic space, of course, is this
physical acquisition. And so in the Android space, once you’ve escalated privileges–and
quite frankly, it’s the same deal in the iPhone space, you get escalated privileges and then
you got two options. In the Android space, dd comes built in. I love it. There’s no copy
command, there’s no CP command in Android. If you want to copy a file, you got shell
access, you have to dd it from one file to the next and I like that and it makes me smile
and mostly confuses everybody else. If you do the dd, dd does not have access to the
Out-of-Band data. So, if you go on you do a dd on one of the MTD devices, you’re actually
not going to get all of the information that you would want for a forensic analysis. Now,
it gives you quite a bit of data, it’s going to get you unallocated data, but it’s not
going to get you all the pieces of the puzzle. So, what you really need to do–and this took
us, some folks, other people some time to figure out, but you need to go in there and
do a full NAND dump. That’s going to include all of that Out-of-Band data that we talked
about. We have a custom version of NAND dump that we developed, it allows us to get a full
dump of the MTD partition and then on top of that, you have to deal with things like
bad blocks and things of that sort. So we basically build our own, you could go out
and compile so of the NAND dump out there that are available and do it for ARM and essentially
use that as well. And once you do that, now, you can take advantages of all of the special
stuff that you get with YAFFS2. YAFFS2 is the file system that originally Google shows
that–it’s basically angled away from and some people run AXT3 and some people are now–the
Google team and are Android teams with the XD4, but YAFFS2 is great. It was open source,
it’s a log structured file system, so the best way to think about that and I had to
look it up when I first read about it, is that it’s essentially like source control
on your file system. Because it doesn’t go back and ever rewrite a block, it can only
erase the block and then–and then write the data there, it just says it’s more efficient
for me to write in front of the wall. So, if you have a file and you change a couple
of bytes in it, it just says go ignore that previous byte and rewrite that entire block
and in front of the wall. So, what we get when we analyze the YAFFS2 file system, if
garbage collection hasn’t occurred, is basically an entirely version file system that we can
recreate every single state the file was ever in. Now, of course, the practice, we have
to reclaim a space on the device and so garbage collection occurs and so we may end up having
fragments of different files. But in fact, we get a very, very dramatic recovery from
the YAFFS2 file system. I don’t think we’ve been geeky enough, so I want to take it up
one more notch here and say that–let’s take a look at YAFFS2 from a [INDISTINCT] point.
So, if we’re–you have access on the device, you can get into the dev/MTD, so, here we
do a NAND dump of the dev/MTD and I wanted to get rid of a bunch of zeros and Fs that
go flying by that’s important to the file system but it’s not that interesting when
we’re looking at it on screen. So, what you essentially have here is we’re looking at
the raw flash NAND dump of a particular file. At the very top, you can see that the file
one.txt is the name. The YAFFS2 file system has basically two types of data. It either
has an object header or it has object data. What we’re looking at here is an object header,
so this is giving us the file name. And then most people would say, “Well, there’s no other
information over here, there’s nothing else I can do so, well, there’s just a bunch of
binary data and a couple here, so let’s move on.” But honestly, there’s quite a bit more
information here, you just have to look at the YAFFS2 source code, figure out what it
is. So, Android stores integers in little endian, so right to left. And if you look
in here, I highlighted a couple different things, you’ll see a repeating pattern of
6399D5D4. In the end, this [INDISTINCT] of being a time stamp. So what you’ve got is
you’ve got a little endian number at the [INDISTINCT] so you actually have to completely reverse
these guys. So, you take that 6399 and you flip it around completely. So you end up with
4D5D9936. You take that number, that hex number and you convert it to a base ten number. You
come out with a time stamp and actually Android does time stamps in milliseconds for the most
part. And so you end up getting the number of milliseconds since 1970. As soon as you
recognize that date format, you can pass it into a number of tools, convert that date,
format their date time stamp. So, file one.txt that was written on Thursday, February 17th
at 3:55 PM, which means that I was working on my book in the middle of a workday in mid
February. So, this is actually an example that was taken out of the book. But it’s very
interesting, with the YAFFS2 file system, you could essentially come back in, rip up
all of the object header files and recreate every single time that a file was accessed,
modified or changed on the entire file system. And so what you have to do is you got to get
into the source book. You have to look at the stuff in hash, you have to try to figure
out what the data looks like and then essentially write programs. The type of stuff that we’re
doing here is not supported in the commercial forensic tools for the most part. So, what
we can do if we spend a couple of years ahead of what the big forensics tools are going
to be and we write our own tight on scripts to essentially rip through the image, pull
off the OOB stuff and do some data carving, go back in, re-put the file system back and
block allocation order, start ripping out the object header, build a timeline and let’s
figure out what would happen on this device. So, by starting with the basic tool, the Sleuth
Kit, dd, hex editors, you can basically get physical images of these devices, work your
way all the way up into the hex dumps and then again figure out the file system structure.
YAFFS2 is interesting, they actually don’t track the access time. Because every time
a file is accessed, they didn’t want to rewrite a new object header, which would be a new
write to the–to the NAND Flash which would ultimately wear the device out. So, there’s
an A time that’s in there. It’s actually the first time that it was created and then they
never updated the access time after that. But they do also track the modified time and
the changed time on the file. You could pull up the object ID that’s out on the Out-of-Band,
you can do different cross referencing and basically figure out, you know, what file
is this, what I know, you know, what are the different blocks that are used in the allocating.
So we could build that entire timeline and then you can also go and begin your [INDISTINCT]
files and other pieces of binary data that might be of importance to your investigation
for your analysis. So, the last slide to kind of wrap this up, this is all kind of interesting
stuff, it’s Android, it’s iPhone, it’s whatever the different files and you can do this on
or anything that’s out there. But the forensic space is a kind of in the corner of security.
So, you’ve got security that sometimes sits at the side and then in the side of that,
we’re all the way off at the corner. So, we’re the guys that don’t get out of the lab that
often. And for a commercial forensics company, the traditional technique was do more investigations.
How do I get bigger? I do ten times investigations and then ten times that and maybe someday
we could have a couple hundred employees doing investigations. There’s a change that’s happening
and we’d like to think that we’re kind of at the forefront of that. While we find the
forensic investigation, the hex analysis fascinating, what’s far more interesting is if you take
this reactive science of Forensics and say, “Let’s not call the Forensics guys in after
there’s an incident, let’s kind not invite them to the party ahead of time, you know,
we want be in the nice offices and have the nice foosball tables and hang out with you
guys.” So, let’s get us out of the corner and move us into the proactive space. And
when you apply forensics in the proactive space, amazing things happen. And I want to
just give you a couple of quick examples. You can check this stuff out online and take
a look at it. The first thing that we do, we do some basic mobile apps security testing.
It’s low hanging stuff. I mean, it’s kind of an easiest than the easiest. So you go
out there, you take a device, you may have privileges on it, you may not need privileges
depending on what’s your view of content providers or what app comes in the backup utilities.
And you go out there and you look for data that says since you stored it on the device
and in an insecure fashion. Now, we’ve been doing this for a little while. We’ve got about
a hundred mobile app reviews out on our website. You can take a look, you can filter it, you
can see what applications or storing data in an insecure fashion. So what’s interesting
about this? Well, by using Forensics, we can spot different issues that we may say to the
development team, “Hey, there’s a better way to store this information.” Now, we can have
lots of debates about, well, if you’re storing information onto a device and you encrypt
it and you did not type in a 32-character, you know, key file every time they want to
access their SMS, have you really secured the information? And in the space and in the
mobile space, especially when you look at the threat to consumers, the main threat to
consumers are cyber criminals, people that want to steal their identity, they want to
get financial information. So what they go for is the easiest stuff, the lowest hanging
proof. If they have to come in, compromise the device, perhaps revoke the–get in there,
find out what programs are running, try to pull the encryption keys out, get that data
off and then maybe get a user name or password. It’s way easier for them to just take all
of the different apps that store your username and password and plain theft, they just copy
them all. So, there’s kind of this–yes, you can’t necessarily fully secure a device if
somebody gets rude on it, but you can make it far more difficult for them. So that’s
one space when you apply Forensics to mobile app security and you take a look at what sort
of data exists on this device. I actually did a presentation down in American Banker
Conference so I think it was a week or so ago. So, there’s a lot more information about
this and if you hit that second link, you kind of go through the presentation and get
some more details on, “How do you apply Forensics to this space? What kind of information can
be recovered?” We have a very simple rating of pass or a fail, something around 17% of
the apps passed. I think somewhere around 30s or so percent get a warning and almost
50% of the apps failed the most basic tests. And now with these information that you would
typically consider private that would be protected by a username and password, this basically
contribute to pull up the device. So that’s kind of interesting space applying Forensics
proactively in the security space and say, “What can we find out about these advices?”
So if you change some of our development techniques and only store the information that really
needs to be stored there. If my Android device were my iPhone or whatever I happen to have
is always online, then why do we have to cash pieces of info. Now, there are applications
that require data to be cashed. In those particular cases, you do a balance between security and
usability and a number of other things. But a lot of times, we’ll simply find information
that has no business being out of device and it’s just sitting there. So, that’s kind of
an interesting application in the mobile space. The, you know, the other space I want to talk
about is that when Forensics guys get called in instead of response guys getting called
in, we’ll come and we’ll look at a computer or we’ll look at a server and something happened.
It may be an hour ago, most likely it was a day or a week or a month ago and we basically
are said–told, “Hey, something happened. Can you help piece–get the puzzle?” And we’re
actually really good at that. But it’s a really tough job, so we’ll come in about 70 or 80%
of what we need to tell you what happened is gone. Network connections, RAN, link files,
somebody cleaned up after themselves, is gone, you can never get it back. Windows does a
great thing. Windows will only track the last time you plugged in a USB drive. And that’s
also only what it feels like. Sometimes it just doesn’t track it at all. So we come back
and somebody said, “Well, we know this USB drive had sensitive info on it. How many times
did they connect it?” We can’t tell you. Windows doesn’t track. So instead of coming in after
the milk has been spilled and trying to put Humpty Dumpty back together again, there’s
a totally different way to approach this problem. Forensic metadata is actually pretty tiny.
If you look at a registry file, it’s a couple [INDISTINCT] so, if you have a key server
that has potential information and they’ve taken all of it, don’t wait until you get
comprised, just pull those three meds off everyday, every hour, every 15 minutes. You
guys know something about storing data and putting it in a database, and analyzing and
making sense of it, right? So, why not just gio with that information somewhere? And so
that’s what we did. One of the other interesting things we called a Continuous Forensics Monitoring,
but the idea is let’s not wait until something happens. Let’s see if we can get ahead of
that. And now you’ve got an exact copy of everything that you need to know. And if something
happened a day or a week ago, guess what, we’ll just pull it up, yeah, all those USB
drives are connected, you guys missed it, we missed it, we weren’t monitoring it. And
I can tell you who, what, when, I can tell you the network connection, I can tell you
what happened. So it’s a really interesting space and I just wanted to share with you
guys to think about if you kind of get into the Forensics stuff and you start tinkering
if it’s interesting to find what’s sitting on your device, it’s far more interesting
to think about how can you take the Forensic Science and apply that proactively to security
or to improving development techniques so that we can come up with more efficient ways
or more effective and more secure ways to store information. So, you know, our big goal
is to get the Forensics guys out of the corner and we appreciate the opportunity to be with
you here today. We share tons of information on our website, we update our blogs, we have
lots of how to’s out there. So if anybody wants to talk to us about it, here’s how you
get a hold of us. And, you know, by all means, buy our books, give us a call and thanks so
much.

18 thoughts on “A Geek’s Guide to Digital Forensics

  1. @disorganizedorg
    Not less secure, just more open. Security through obscurity is no defense.

    In any case the data comes off, just the wear leveling etc. changes the exact physical image of the device even if no writes are happening, which makes verifying the image less viable.

  2. SMART PEOPLE ARE NOT GEEKS. Jesus stop ruining my will to be a computer engineer with this retarded practice of calling yourselves "geeks".

  3. Why don't speakers can't go straight into the main topic! " Google invited me….bla bla bla…" disgusting! Like the video though!

  4. In a given YouTube video, viewers experience is vastly improved and no information is lost by skipping first 30 % of that video length. That's called the Wadsworth Constant 😛

  5. Forensic psychology and Forensic toxicology .
    Interesting books are "Guide to Information Sources in the Forensic Sciences"  really gopod read

  6. @If Only  A logical extraction is non-deleted information that is retrieved from the phone like texts, videos and call logs- just data that is stored locally on the phone. We have some more videos featuring Andrew and others if you'd like to check them out!

  7. Logical is like a partition on your PC hard drive. Instead of a physical acquisition, which is a bit for bit copy on the entire hard drive, including deleted, unallocated, etc., the logical acquisition retrieves the files on that particular section of the hard drive

  8. Does anyone know what is the major difference which occurs between FAT12/16 and FAT 32 which occurs during format?

Leave a Reply

Your email address will not be published. Required fields are marked *