Azure Friday | Profiling Live Azure Web Apps with Application Insights


Hey, friends,
I’m Scott Hanselman. This is another episode
of Azure Friday. I’m here with Dan Taylor. And we’re learning all about
what we can do in the cloud. In the previous episode, we
talked about snapshot debugging and debugging things
in production. Now you’re gonna tell me how
I can make things faster in production.>>Right, so once your
application is running correctly in production, you want to have
a really good experience for all of the end users
of your application. Both the people who visit your
website on a daily basis. And then the odd customer might
have some performance issues as well. So you want your app to be fast
and responsive for everyone. Cuz sometimes it can be very
frustrating when you have someone come and say, the
website was slow this one time. And you can’t really
reproduce that yourself.>>In my experience,
it’s surprising how powerful the profilers that we
have available are.>>Mm-hm.>>And it is surprising how
few people profile their code. Like really sit down and
find out where the hot path is.>>Yeah, I think
the views of the profiler are a little bit unfamiliar. And you have to think
performance a little bit. And hopefully, I can make that, explain a little bit
of how to do that. But really, the key is to know where
you’re spending time in code. And if you can figure out what
function you’re on the call stack for, the rest is usually, the performance issue
usually just falls right out. And I’ll show a couple of
examples of that today.>>Cool,
you have some live demos?>>Yeah, so I’ve got
this Fabrikam site here. And this is an app that
has been hanging around Microsoft as a demo app for
a while. And you know this, of course. So I actually took
this application. And usually we go and we try and
add a performance issue to the application to make
it pop in the profiler. And for this one,
I just decided to go in and say, this app has been around for
a while. There’s gotta be some
slow stuff in here. So I’ve got this
Service Tickets page. And I went in and
I did some analysis on this. So the prerequisite here is I’ve
got this application set up with Application Insights. So if I’m gonna go and look, here’s my Application Insights
resource here in Azure for this.>>Okay.>>And so
if I go to the Performance view in Application Insights,
right here. We’ve got the profiler
built into this. So if you’re
an Application Insights user, you have to take one step here. And that is to come to this
Configure tab here and just click
the Enable Profiler button. And then we detect
its length to the web app service that
it’s running in. And we install a profiler agent. And so what this does is it’ll
run a system-wide profiler trace for about two minutes
out of every hour. So a random sampling
in production.>>Okay, so
it’s not running all the time.>>It’s not running
all the time. But it’s designed for scale. So we have people with up to 500
instances running internally. And at that scale, the overhead
is barely noticeable. And you always have constant
streams of data coming out. So we’d like to think if
you got slow requests, eventually, we’ll
capture them and they’ll just show up here
in the Performance view.>>I see, interesting. So if I’m a one-person shop and
I have a small site, it might not
necessarily hit stuff.>>It would take a little
bit longer maybe.>>Because, yeah,
it would have to be slow things happening within
that window of time.>>Mm-hm.
>>Okay, right, it makes sense.>>So
rather than have to sit down and go in with Visual Studio or
something like that and trying to attach a profile to
your production environment, you just sit back and they’ll
show up on a regular basis.>>Makes sense.>>So remember that
Service Tickets page there. So now that I turned
on the profiler, the profiler traces just show
up here in the performance blade next to the operations. So I wanna point out a few
different things before I look at the profiler traces for
these requests. We show a few different
metrics that help you decide what operations in
your application to look at. We have the count. So you might wanna look at your
most frequently visited sites if you want, your home page or pages that users are visiting
most often, to be fast. And we also have the median,
which is the average or typical website performance. We also show you the 95th
percentile, which gives you, it says, 5% of requests
were this speed or slower.>>Right, the outliers.>>Yeah, so outliers without
being true outliers. Cuz some of the true outliers
are stuff you really can’t control like disk swapping and
things like that. So let’s go ahead and click
it on the profiler traces for this Service Tickets page. So I’m gonna click the View
button there next to that one. It’s got a median of
350 milliseconds and a 95th percentile of 568. So here it gives me the list of
all the profiler traces we’ve collected in the last 24 hours. That’s the current selected
range of time here. And on the left hand side, I get to pick which individual
example I wanna look at. So it tells me that this
448 milliseconds one here was collected at 12 PM.>>Does this mean that
there were only four, or is it just picking
four good ones?>>Four in that bucket, four good ones from that bucket,
right.>>Okay.
>>And if you wanna see all of them you
can actually come up here and [CROSSTALK]
>>Okay, so there could have been
bajillions, I see.>>We have lots.>>You picked some
great examplars.>>Yep, yep, so the right-hand side gives me
sort of the breakdown for each. It’s a call tree, so it’s all of the call stacks that
were sort of built over time, right, and so that’s why you
have a tree to navigate, right? So any particular point in
the tree kinda gives you, for this call stack it gives you the
total amount of time that it was running on the call stack
during that request. And it also gives you sort
of this execution timeline, this graphical view of when that
was running from the start to the finish so I can see
the ordering of things. For example, if I come down here
I see that there was a database request here that was about 1.53
milliseconds at the very start of the request.>>Not bad.>>People always tell me it’s
the database that’s usually the problem for
the performance, but in this case that’s not true. So but I’m just gonna kinda look
at the code here, and the thing that it expands right out to
the biggest sort of item in the list, which is this handle
end of the append-only item. And if I expand this here it
shows me how much of this item was spent running on the CPU
which was about 106 milliseconds of time, right, and you know
we’re all about asynchrony, with asynchronous web requests
and things like that.>>But if you see time spent on
the CPU, that’s really bad for scalability, because
only one person, one request can be using
the CPU at a time.>>How do you judge, though? How do you look at that and say
okay, CPU time 100 milliseconds? That was a lot. Do you judge that based on well,
I know what I wanted it to accomplish and that number
feels like a lot, or doesn’t? Is this your way of saying
you’re in someone’s for loop right here?>>Yeah, so it really depends
on what are your goals. Are you trying to serve
1 million people, or 100 people that have
a heavy workload? In this case, I know that this
limits me to about ten requests a second on this CPU. So for me, I’d like to be able to serve
maybe 100 simultaneous users.>>So you need to figure out how
to do whatever work was being done when you hit
slash service tickets?>>Mm-hm.>>So if you need to go and dig
into handle entity append only, what was its intent, what
was that function’s job, and do it in less milliseconds?>>Right, so actually the way
I usually do this, and I’ve been working in
performance a long time. And this is sort of
my go-to approach, is I’ll just take this and I’ll
search for that function and then type the word
slow at the end.>>[LAUGH] Are you serious?>>And then you end up
in some blog post, or Stack Overflow article over
here, because almost the most common performance issues, the
one you’re likely to run into, are ones that people
have hit before.>>That makes sense.>>So if I go to this
article here and I read through the post, I’ll save
you sort of the reading here. Basically this ended up
being entity framework does a lot of change
management tracking, right? And if you notice this page
here, it’s got lots of items.>>Wow, I’m looking at your
scroll bar, that’s huge. Maybe should have some paging.>>Right, but we can actually
make this go faster because it turns out this is
a read-only query and I don’t need to do all that
change management tracking.>>So you just say, I’m guessing
you just say disable entity framework’s change tracking.>>Right, so if I just come
into my code here, and this is what I learned from that
post, I just put as no tracking, and that cost will go away.>>And that 100
milliseconds becomes 10.>>Yep, exactly.>>Interesting, I think that’s funny that with
all of your experience, and when you were saying I’ve been
doing performance for many, many years, so what I do is I
type the function name and slow.>>Right.>>That’s your secret?>>That’s a really good secret,
in fact. Yeah, and
I’ve got one more example here. I’ll do the exact same thing. It really is about just
identifying where you’re spending that time in that code.>>It really is.>>And that’s what
the profiler helps you do. So again, if I look here,
there’s another 90 millisecond cost here action link,
link extensions action link. Should we do it again?>>Yeah, please.>>Action link slow.>>[LAUGH]
>>Action link very slow.>>Right, and so they’re saying, okay, I’ve got action links
here and I typed for loop.>>Yeah.
>>And here’s a tip here that if you
generate an action link once and you use string substitution
to replace the contents of the action link,
you save a lot of that time.>>So you’re generating
action link in a loop and you had hundreds, and hundreds,
and hundreds of them. You can do it
outside the loop and then just swap out the string
that’s inside the loop.>>That’s correct, and again
I didn’t write this app but then this is all stuff that I
discovered by looking at this profile.>>Right here, and this is nice
because you could have come here with a demo that was like,
and I made this slow on purpose.>>Yeah, here’s my for
from 1 to 1 billion.>>You found really slow stuff,
legitimately slow stuff.>>And so there’s that action
link right there in a for loop.>>In a for loop, so you’d pull
it out, take the rendered HTML, and then swap it.>>That’s right.>>That’s very clever. Cool, I appreciate this.>>Yeah.
>>The fact that I’ve got this also, I just wanna say that
even though there’s great local profilers, there’s
third party profilers, there’s built in profilers that
you can do on your desktop. Nothing is gonna beat
the reality of how it behaves in the cloud, and being able to see
this in a cloud based profiler has a lot of power.>>Especially under load, too, when you have multiple
users hitting the site.>>That’s a good point,
under load.>>And the convenience is I
don’t have to set it up and dump. The profiles just show up for
me.>>Very cool, we are learning
all about how to get code level analysis of slow requests in
production with application insights here on Azure Friday. [MUSIC]

Leave a Reply

Your email address will not be published. Required fields are marked *