Tag Archives: KQL

Kusto Detective Agency Season 2: Case 5 – Blast into the past

Challenges

Click for challenges

  • Onboarding: Here
  • Challenge 1: Here
  • Challenge 2: Here
  • Challenge 3: Here
  • Challenge 4: Here
  • Challenge 5: This article
  • Challenge 6: Here
  • Challenge 7: Here
  • Challenge 8: Here
  • Challenge 9: Coming soon
  • Challenge 10: Coming soon

Prof. Smoke is back and so is the mysterious El Puente but an interview on a new secret KQL technology has gone missing! Luckily we have all the tools to track down the missing file.

General advice

I must say the clues this time around felt very fluffy, it seems that they are intended to be vague, and the meat of the tips will be presented in the “train me” content. I feel like this is a step in the wrong direction as it’s sometimes helpful to have a better logic clue then just training.

Challenge: Case 5

Case 5 challenge text

Hey there!

It’s been a while since we talked, my friend. I’ve been keeping an eye on your remarkable detective work, and you never cease to amaze me. Well done! Just a friendly word of caution, though—I have a feeling you’re getting closer to the edge. Are you ready to take the leap?

Anyway, I’ve got an offer that’ll pique your interest. I’ve stumbled upon some seriously valuable intel about Kuanda org, and I know it could be a game-changer for the case you’re currently tackling. But before I speak, I need your assistance with a little something.

You know about Scott Hanselman’s incredible video podcasts, right? The guy’s a legend! He usually drops them every week, and they rack up thousands of views. But here’s the twist: Something mysterious went down with his 900th episode, slated for release a few weeks back. Rumor has it that Prof. Smoke, the renowned Big Data expert, spilled the beans on some top-secret functionality during the interview. The thing is, he had a change of heart and insisted the video be scrapped, so it never saw the light of day.

Now, hold on to your detective hat, because this is where it gets intriguing. I have a hunch that the interview was actually published before its deletion. Everyone assumed it was gone for good, but I managed to sneak my way into the public archive logs. These logs have all the juicy details—creation and replication timestamps, access records, and even deletion operations. My hope is that the deletions weren’t fully synced into the archive, leaving behind some remnants of that epic interview.

So, here’s the deal, my friend. If you can crack this case and unearth the elusive interview link, I promise to tell everything about Kuanda.org. Believe me, the information I have is pure gold.

Are you ready to dive headfirst into this thrilling challenge? Find that elusive video URL, and I’ll do everything in my power to dig up every last detail about Kuanda.org.

Let me know if you’re in, and let the game begin, detective!

Cheers,
El Puente

This case is awesome in that there are several different routes to arrive at the answer depending on your logical approach.

Query Hint

Depending on how you go about solving this puzzle you can make use of several different KQL commands ones I found useful were parse, let and split and found it very helpful to tackle this challenge in stages.

Solution – Spoilers below

Chatting to the community on this one has been awesome, I’ve seen 4 completely unique approaches to solving this puzzle and I’m going to talk about mine and one other below.

Query Case 5

//I tackled this challenge in a set of logical steps, looking for blobs that had no views, which had been deleted but not completely deleted. Using this in combination with hosts that had a dip in usage got to several suspect files one of which was the missing interview. Not elegant by any means but it got the job done.

let hosts=
   StorageArchiveLogs
| parse EventText with TransactionType ” blob transaction: ‘” BlobURI “‘” *
| parse EventText with * “(” Reads:long “reads)” *
| extend Host = tostring(parse_url(BlobURI).Host)
| summarize Deletes=countif(TransactionType  == ‘Delete’),
        Creates =countif(TransactionType == ‘Create’),
        Reads=sumif(Reads, TransactionType == ‘Read’) by Host
| where Deletes > 0
| where Creates > 0
| where Creates > Deletes
| order by Reads
| take 50;
let Scott =
StorageArchiveLogs
| parse EventText with TransactionType ” blob transaction: ‘” BlobURI “‘” *
| parse EventText with * “(” Reads:long “reads)” *
| extend Host = tostring(parse_url(BlobURI).Host)
| where Host in~ (hosts)
| make-series Count=sumif(Reads, TransactionType == ‘Read’) default=0 on Timestamp step 12h by Host
| project series_periods_validate(Count,14,10), Host, Count
| extend Score = tostring(series_periods_validate_Count_scores[0])
| where Score != “0.0”
| project Host;
let deletes=
StorageArchiveLogs
| parse EventText with TransactionType ” blob transaction:” *
| parse EventText with * “blob transaction: ‘” BlobURI “‘” *
| where EventText has “completely”
| distinct BlobURI;
let uri=
StorageArchiveLogs
| parse EventText with TransactionType ” blob transaction:” *
| parse EventText with “‘ read access (” ReadCount:long ” reads) were detected on the origin”
| parse EventText with * “blob transaction: ‘” BlobURI “‘” *
| extend Host = tostring(parse_url(BlobURI).Host)
| where Host in (Scott)
| project EventText, BlobURI, TransactionType, ReadCount, Host
| summarize Deletes=countif(TransactionType  == ‘Delete’),
        Creates =countif(TransactionType == ‘Create’),
        Reads=sumif(ReadCount, TransactionType == ‘Read’) by Host, BlobURI
| sort by Reads asc
| where Deletes > 0
| where Creates > 0
| distinct BlobURI;
StorageArchiveLogs
| parse EventText with TransactionType ” blob transaction:” *
| parse EventText with * “blob transaction: ‘” BlobURI “‘” *
| where BlobURI in~ (uri) and BlobURI !in (deletes)
| where EventText has_all (“backup”,”create”)
| distinct EventText
| extend SplitAll=split(EventText, ‘ ‘)
| extend Backup=tostring(SplitAll[8])
| project-away  EventText, SplitAll
| where Backup contains “mp4”

//A far neater approach by my colleague Nabeel Prior

// get the blob accounts that had 1 new creation each week for 4 weeks
let weeklyCreates =
StorageArchiveLogs
| where EventText startswith “Create blob”
| project
    createTS=Timestamp,
    blobAccount=substring(EventText, indexof(EventText, “https://”), indexof(EventText, “/”, 1, 1000, 3) – indexof(EventText, “https://”))
| summarize accountCreatesPerWeek = count() by blobAccount, week_of_year(createTS)
| where accountCreatesPerWeek == 1
| summarize count() by blobAccount
| where count_ >= 4;
// get all the creates for the above blob accounts
let allCreates =
StorageArchiveLogs
| where EventText startswith “Create blob”
| project
    blobAccount=substring(EventText, indexof(EventText, “https://”), indexof(EventText, “/”, 1, 1000, 3) – indexof(EventText, “https://”)),
    createUrl=substring(EventText, indexof(EventText, “https://”), indexof(EventText, “‘ backup is created on “) – indexof(EventText, “https://”)),
    backupUrl=tostring(split(replace_string(EventText, “Create blob transaction: ‘”, “”), “‘ backup is created on “, 1))
| join kind=inner weeklyCreates on $left.blobAccount == $right.blobAccount;
// for all creates above, find those that were partially deleted
StorageArchiveLogs
| where EventText startswith “Delete blob”
| extend deleteUrl=substring(EventText, indexof(EventText, “https://”), indexof(EventText, “‘ backup is”) – indexof(EventText, “https://”))
| extend deleteAction=iif(EventText contains “completely”, 1, 0)
| join kind=inner allCreates on $left.deleteUrl == $right.createUrl
| summarize completelyDeleted=sum(deleteAction), firstDelete=min(Timestamp), lastDelete=max(Timestamp), numberOfDeletes=countif(deleteAction == 0) by deleteUrl, backupUrl
| where completelyDeleted==0;

This weeks puzzle was great fun, and I really enjoyed the discussions about the various approaches to solving it. Well done detectives and kep an eye out for El Puente because we don’t know where they may strike next!

Loading

Kusto Detective Agency Season 2: Case 3 – Return Stolen cars!

Challenges

Click for challenges

  • Onboarding: Here
  • Challenge 1: Here
  • Challenge 2: Here
  • Challenge 3: This article
  • Challenge 4: Here
  • Challenge 5: Here
  • Challenge 6: Here
  • Challenge 7: Here
  • Challenge 8: Here
  • Challenge 9: Coming soon
  • Challenge 10: Coming soon

There sure is a lot of strange things happening in Digitown at the moment. This time cars are being stolen and it’s up to us to try and catch the thieves! This was an enjoyable case and requires a great spread of KQL and puzzle solving to catch those crooks.

General advice

If you have completed season 1 this case may seem familiar to you, except this time, there’s a twist. The clues are quite good this time around and the training has improved. Tackling this one in stages can make it quite a bit easier to crack this case.

Challenge: Case 3

Case 3 challenge text

Hey there Detective,

We’ve got an urgent case that needs your expertise! There has been a sudden increase in unsolved cases of stolen cars all across our city, and the police need our help again to crack the case.

We’ve been given access to a massive dataset of car traffic for over a week, as well as a set of cars that have been stolen. It’s possible that the car’s identification plates were replaced during the robbery, which makes this case even more challenging.

We need you to put on your detective hat and analyze the data to find any patterns or clues that could lead us to the location of these stolen cars. It is very likely that all the stolen cars are being stored in the same location.

Time is of the essence, and we need to find these cars before they are sold or taken out of the city. The police are counting on us to solve this case, and we can’t let them down!

Are you up for the challenge, detective? We know you are! Let’s get to work and crack this case wide open!

Best regards,
Captain Samuel Impson.

Time to get to work and track those car thieves

Query Hint

This case is setup to use more logic than assumptions. Think about how you would find out where the cars are being taken to have their VIN numbers changed. Check out these KQL commands for some help arg_max, join and make_list.

Solution – Spoilers below

This solve can be done more optimally but I did it in two steps

Query Case 3

//First, we need to know where the VIN numbers are being changed, luckily, we can track all of the stolen cars relatively easily at first and we’ll find two locations the cars are being taken to.

CarsTraffic
| join kind = inner (StolenCars)
  on VIN
| summarize arg_max(Timestamp, *) by VIN
| order by Ave
| summarize count(VIN) by Street, Ave

//Now comes the tricky part we need to find cars leaving these locations with unknown VIN numbers and figure out where the stolen cars are being taken, what we do know is now many stolen cars we are looking for. Well look at that a suspicious location!

let Suspects =
CarsTraffic
| summarize arg_min(Timestamp, *) by VIN
| where (Street == 86 and Ave == 223) or (Street == 251 and Ave == 122)
| summarize mylist = make_list(VIN);
CarsTraffic
| where VIN in (Suspects)
| summarize arg_max(Timestamp, *) by VIN
| summarize Vins = count(VIN) by Ave, Street
| where Vins == 20

It was only a matter of time before these thieves were brought to justice. These cases are getting more and more exciting, I wonder where the next one will take us. As always, great work detectives!

Loading

Kusto Detective Agency Season 2: Case 1 – To bill or not to bill?

Challenges

Click for challenges

  • Onboarding: Here
  • Challenge 1: This article
  • Challenge 2: Here
  • Challenge 3: Here
  • Challenge 4: Here
  • Challenge 5: Here
  • Challenge 6: Here
  • Challenge 7: Here
  • Challenge 8: Here
  • Challenge 9: Coming soon
  • Challenge 10: Coming soon

In this first case we’re asked to solve a billing problem, not the most exciting thing but certainly interesting with some real-world applications for the use of data. I quite enjoyed this challenge as it reminded me to keep things simple and not discount any ideas as silly just yet.

General advice

For this case the wording tripped me up a little bit, make sure you understand what’s being asked and check out the training if necessary. I will say, while I like the idea of the training it eventually put me on the wrong track, so use it but also keep an open mind.

Challenge: Case 1

Case 1 challenge text

Dear Detective,

Welcome to the Kusto Detective Agency! We’re thrilled to have you on board for an exciting new challenge that awaits us. Get ready to put your detective skills to the test as we dive into a perplexing mystery that has struck Digitown.

Imagine this: It’s a fresh new year, and citizens of Digitown are in an uproar. Their water and electricity bills have inexplicably doubled, despite no changes in their consumption. To make matters worse, the upcoming mayoral election amplifies the urgency to resolve this issue promptly.

But fear not, for our esteemed detective agency is on the case, and your expertise is vital to crack this mystery wide open. We need your keen eye and meticulous approach to inspect the telemetry data responsible for billing, unravel any hidden errors, and set things right.

Last year, we successfully served Mayor Gaia Budskott, leaving a lasting impression. Impressed by our work, the city has once again turned to us for assistance, and we cannot afford to disappoint our client.

The city’s billing system utilizes SQL (an interesting choice, to say the least), but fret not, for we have the exported April billing data at your disposal. Additionally, we’ve secured the SQL query used to calculate the overall tax. Your mission is to work your magic with this data and query, bringing us closer to the truth behind this puzzling situation.

Detective, we have complete faith in your abilities, and we are confident that you will rise to the occasion. Your commitment and sharp instincts will be instrumental in solving this enigma.

Sincerely,
Captain Samuel Impson.

Right let’s get down to business and get the citizens of Digitown their correct bills!

Query Hint

There are two things wrong with the billing run this month and you’ll have to find both to get the right answer. KQL commands that will be useful are arg_min and distinct.

There is a bit of an investigation that needs to be done to uncover the issues with the data and there are various angles you can take, such as looking at specific houses or dates just to name a couple.

Solution – Spoilers below

Have you found the two things wrong with the billing?

Query Case 1

//The most obvious issue is that some customers are being double billed, so we need to remove those duplicates. Also, it turns out that some customers are using negative water and electricity, what doesn’t seem possible so let’s get rid of that too.

Consumption
| where Consumed > 0  
| distinct Timestamp,HouseholdId,MeterType, Consumed
| summarize TotalConsumed = sum(Consumed) by MeterType  
| lookup Costs on MeterType  
| extend TotalCost = TotalConsumed*Cost  
| summarize sum(TotalCost)

Great work detectives! This case gave me a nice opportunity to stretch my “KQL legs” and i found it to be a fun experience. I’m definitely looking forward to the next one.

Loading

Kusto Detective Agency Season 2 – Onboarding

Challenges

Click for challenges

  • Onboarding: This article
  • Challenge 1: Here
  • Challenge 2: Here
  • Challenge 3: Here
  • Challenge 4: Here
  • Challenge 5: Here
  • Challenge 6: Here
  • Challenge 7: Here
  • Challenge 8: Here
  • Challenge 9: Coming soon
  • Challenge 10: Coming soon

It’s exciting to have another season of the Kusto Detective Agency, this is an excellent way to learn KQL and gain some useful skills that are useful with many Microsoft products including Azure MonitorSentinelM365 Defender and Azure Data Explorer (ADX) to name a few.

General advice

If like me, you’re still in full detective mode from last season, then take a moment to reset your “complexity level” we’re starting again with the basics and it’s best to approach these 10 challenges in that way, from simple to complex.

Challenge: Onboarding

Onboarding challenge text

If you have been here for Season 1, you may be surprised to find yourself as a Rookie again. You see, it’s all about innovation and hitting refresh. So, it’s a fresh start for everyone. Yet we believe in excellence and that’s why we need your detective skills to unveil the crème de la crème of detectives from the past year, 2022. This is like the ultimate leaderboard challenge where we crown the “Most Epic Detective of the Year.” Exciting, right?

Imagine our agency as a buzzing beehive, like StackOverflow on steroids. We have a crazy number of cases popping up every day, each with a juicy bounty attached (yes, cold, hard cash!). And guess what? We’ve got thousands of Kusto Detectives scattered across the globe, all itching to pick a case and earn their detective stripes. But here’s the catch: only the first detective to crack the case gets the bounty and major street cred!

So, your mission, should you choose to accept it, is to dig into the vast archives of our system operation logs from the legendary year 2022. You’re on a quest to unearth the absolute legend, the detective with the biggest impact on our business—the one who raked in the most moolah by claiming bounties like a boss!

Feeling a bit rusty or want to level up your Kusto skills? No worries, my friend. We’ve got your back with the “Train Me” section. It’s like a power-up that’ll help you sharpen your Kusto-fu to tackle each case head-on. Oh, and if you stumble upon a mind-boggling case and need a little nudge, the “Hints” are there to save the day!

Now, strap on your detective hat, embrace the thrill, and get ready to rock this investigation. The fate of the “Most Epic Detective of the Year” rests in your hands!

Good luck, rookie, and remember to bring your sense of humor along for this wild ride!

Lieutenant Laughter

To get started we simply need to identify the detective who won the most bounties from season 1, luckily we have everything we need to get started.

Query Hint
There are a couple of key pieces of information we need to solve this:

  1. We know there are different IDs for each detective
  2. Only the first detective with the correct solution can claim the bounty
  3. Who has the most bounties?

    KQL commands that will be useful to achieve this are extend, summarize arg_min and join.

Solution – Spoilers below

To solve this, we need to find out the bounty for each case and then join that with the winner of each case.

Query Onboarding


//Who is the winner
let Bounties =
DetectiveCases
| extend Bounty = toint(Properties.Bounty)
| project CaseId, Bounty;
let Winner =
DetectiveCases
| where EventType == “CaseSolved”
| summarize arg_min(Timestamp, DetectiveId) by CaseId;
DetectiveCases
| join kind=inner Winner on CaseId
| join kind=inner Bounties on CaseId
| summarize sum(Bounty) by DetectiveId1
| top 1 by sum_Bounty desc

Bonus answer in 4 lines of code

Turns out the detective with the most bounties is also the detective with the most entries. Which just goes to show you, there are different ways to get the right answers.

DetectiveCases
| summarize count() by DetectiveId
| where isnotempty(DetectiveId)
| top 1 by count_

All in all I’m glad season 2 is here and I am excited to crack these cases, good luck detectives and welcome aboard!

Loading

Kusto Detective Agency Season 2 is here!

Welcome back detectives, to a new exciting season of Kusto Detective Agency, this time around there are 10 cases to solve and some new tools to help you sharpen those KQL skills!

What is it?

The Kusto Detective Agency is a set of challenges that is designed to help you learn the Kusto Query Language (KQL), which is the language used by several Azure services including Azure Monitor, Sentinel, M365 Defender and Azure Data Explorer (ADX) to name a few. The challenges are gamified and interactive and consist different exciting cases across two seasons.

Each case has a different scenario that you need to solve using KQL queries, where you can earn badges, and they get progressively more difficult as you help the citizens of Digitown.

Season 1 is still available, and I talk about my experience with those challenges here.

Where can I get started?

It’s easy to get started just creating your free ADX cluster and report for duty at the detective agency!

Access the challenges here – https://detective.kusto.io/
Create your free ADX cluster here – https://aka.ms/kustofree

What’s new?

Hints return from season 1 but the new and exciting feature is a set of training that you can complete to prepare you for each case. This highlights specific commands and techniques that are relevant to solving the various puzzles. Just click “Train me for the case to get started”.

My thoughts

KQL is very valuable considering all of the products that make use of the language and being able to write a basic query does make working with those products much easier. Learning in this gamified way also makes the process more interesting and if the cases from season 2 are anything like season 1 we’re in for a lot of fun. I will be documenting my experience with season 2 and would highly recommend the Kusto detective Agency for anyone who could benefit from KQL skills.

Loading

Uncovering Anomalies in Time-series Data with Kusto Query Language (KQL)

Anomaly detection is a crucial task in monitoring the performance of various systems. In this blog post, we will discuss how to use Kusto Query Language (KQL) to detect anomalies in CPU performance data.

Spikes

One of the most common types of anomalies is spikes in the data. Spikes occur when the data deviates significantly from its normal behavior. To detect spikes in CPU usage over time, we can use the following KQL query:

let window = 24h;
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where TimeGenerated > ago(window)
| summarize avg(CounterValue),stdev(CounterValue) by bin(TimeGenerated, 2h), Computer
| where (avg_CounterValue - avg_CounterValue) > 3 * stdev_CounterValue

This query first filters the data to include only CPU usage data and only the data that is within the last 24 hours. It then groups the data by time window and computer, calculates the average and standard deviation of the data, and finally filters out any data points that are more than 3 standard deviations away from the average.

Outliers

Another type of anomaly is outliers. Outliers are data points that are significantly different from the rest of the data. To detect outliers in CPU usage across different machines, we can use the following KQL query:

Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize percentile(CounterValue,75) by Computer
| where percentile_CounterValue_75 > 50

This query filters the data to include only CPU usage data, calculates the 75th percentile of the data for each computer, then filters the results and only show the computers that have 75th percentile values higher than 50.

Changes over time

Finally, another type of anomaly is changes in the data over time. To detect changes in CPU usage over time, we can use the following KQL query:

let window = 7d;
Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| where TimeGenerated > ago(window)
| summarize avg(CounterValue) by Computer, TimeGenerated = startofday(TimeGenerated)
| join (
    Perf
    | where ObjectName == "Processor" and CounterName == "% Processor Time"
    | where TimeGenerated > ago(window)
    | summarize arg_min(TimeGenerated, CounterValue) by Computer, TimeGenerated = startofday(TimeGenerated)
    | where TimeGenerated < TimeGenerated
    | project Computer, TimeGenerated, CounterValue
) on Computer, TimeGenerated
| extend diff = avg_CounterValue - CounterValue
| where abs(diff) > 10

This query filters the data to include only CPU usage data and only the data that is within the last 7 days. It then groups the data by day and computer, calculates the average of the data, and finds the difference between consecutive days’ averages. The query finally filters out any data points where the difference is greater than 10.

Summary

In this blog post, we have discussed how to use KQL to detect different types of anomalies in CPU performance data. These queries can be customized and adjusted to fit the specific needs of your system and can be a valuable tool in monitoring and maintaining the performance of your systems. Anomaly detection can be complex but is also very powerful.

Loading

Kusto Detective Agency: Challenge 5 – Big heist

Challenges

The ADX team upped their game once again. Time for a proper forensic investigation, track down the baddies, find clues and decipher their meaning all while racing against the clock. Can you come up with the date and location of the heist in time to stop them?

General advice

This challenge requires a bit of creative thinking, even with the hints there are multiple paths to go down and not all of them are going to lead to the right outcome. the key to this one, keep it simple and logical.

Challenge 5: Big heist

This challenge also has multiple parts, first we need to identify four chatroom users from over three million records and then we need to “hack” their IPs to get more clues.

Query Hint Part 1

Trying to identify the right user behaviors here is super tricky, I got tripped up here by adding a level of complexity that was unnecessary. At its simplest we would have to find a room where only 4 people joined and no one else. Some KQL commands that will be useful here are tostring, split, extend, row_cumsum

Big heist challenge text - Part 1

Hello. It’s going to happen soon: a big heist. You can stop it if you are quick enough. Find the exact place and time it’s going to happen.
Do it right, and you will be rewarded, do it wrong, and you will miss your chance.

Here are some pieces of the information:
The heist team has 4 members. They are very careful, hide well with minimal interaction with the external world. Yet, they use public chat-server for their syncs. The data below was captured from the chat-server: it doesn’t include messages, but still it may be useful. See what you can do to find the IPs the gang uses to communicate.
Once you have their IPs, use my small utility to sneak into their machine’s and find more hints:
https://sneakinto.z13.web.core.windows.net/<ip>

Cheers
El Puente

PS:
Feeling uncomfortable and wondering about an elephant in the room: why would I help you?
Nothing escapes you, ha?
Let’s put it this way: we live in a circus full of competition. I can use some of your help, and nothing breaks if you use mine… You see, everything is about symbiosis.
Anyway, what do you have to lose? Look on an illustrated past, fast forward N days and realize the future is here.

Query challenge 5 - Part 1

let rooms =
ChatLogs
| where Message contains “joined”
| extend user = tostring(split(Message,” “,1))
| extend chan = tostring(split(Message,” “,5))
| distinct user, chan
| summarize count() by chan
| where count_ == 4
| project chan;
let chatroom =
ChatLogs
| extend action = tostring(split(Message,” “,2))
| where action contains “joined” or action contains “left”
| extend A1 = iif(action contains “joined”, 1, -1)
| extend user = tostring(split(Message,” “,1))
| extend chan = tostring(split(Message,” “,5))
| where chan in (rooms)
| order by Timestamp asc
| extend total=row_cumsum(A1, chan != prev(chan))
| where total ==4
| distinct chan;
let users =
ChatLogs
| extend chan = tostring(split(Message,” “,5))
| where chan in (chatroom)
| extend user = tostring(split(Message,” “,1))
| distinct user;
ChatLogs
| extend user = tostring(split(Message,” “,1))
| where user in (users)
| where Message contains “logged”
| extend IP = tostring(split(Message,” “,5))
| distinct IP

Alright we’ve got some IPs, so time to “hack”, using the provided tool you’ll gather a set of clues from each of the gang members, there are a few key things you need to find, these are an email, some pictures, a cypher tool, an article and a pdf copy of it and of course a video from the nefarious professor Smoke.

From here on out it’s all investigative skills, you now have everything you need to find the date and location of the heist and save that datacenter!

Final hint

In order to decrypt the secret message, you’re going to need a special key, the format looks familar right? Spot on you’ll need the answer from challenge 4!

Congratulations Detective!

If you’ve found this blog series useful, please let me know via LinkedIn or drop a comment below. These challenges have been super fun and I for one am looking forward to season 2!

Loading

Kusto Detective Agency: Challenge 4 – Ready to play?

Challenges

Just when you thought these challenges couldn’t get any cooler along comes your very own nemesis and a multi-part puzzle taking you on a street tour of New York City.

General advice

First, we need to import the data ourselves this time around, using Ingest from Blob under our data blade, you can also change the column name I used “Primes”
Calculating the prime numbers can be a little tricky as our free ADX cluster requires us to be clever with our query in order to allow it to complete, luckily, we get a free lesson on “special primes”

Challenge 4: Ready to play?

This challenge has two parts and we’ll look at them in turn, first we need to identify a specific prime number and then use that to get the second clue and then we have to find a specific area in New York City,

Query Hint Part 1
Calculating the largest special prime under 100M can be done in a variety of ways, the trick is working within the limited capacity of our free ADX cluster. KQL commands that are useful are serialize, prev, next and join
Ready to play? challenge text - Part 1


Hello. I have been watching you, and I am pretty impressed with your abilities of hacking and cracking little crimes.
Want to play big? Here is a prime puzzle for you. Find what it means and prove yourself worthy.

20INznpGzmkmK2NlZ0JILtO4OoYhOoYUB0OrOoTl5mJ3KgXrB0[8LTSSXUYhzUY8vmkyKUYevUYrDgYNK07yaf7soC3kKgMlOtHkLt[kZEclBtkyOoYwvtJGK2YevUY[v65iLtkeLEOhvtNlBtpizoY[v65yLdOkLEOhvtNlDn5lB07lOtJIDmllzmJ4vf7soCpiLdYIK0[eK27soleqO6keDpYp2CeH5d\F\fN6aQT6aQL[aQcUaQc[aQ57aQ5[aQDG

Start by grabbing Prime Numbers from
https://kustodetectiveagency.blob.core.windows.net/prime-numbers/prime-numbers.csv.gz and educate yourself on Special Prime numbers (https://www.geeksforgeeks.org/special-prime-numbers), this should get you to
https://aka.ms/{Largest special prime under 100M}

Once you get this done – you will get the next hint.

Cheers,
El Puente.

Query challenge 4 - Part 1

//Method 1 – This query will calculate the largest prime under 100M using the Sieve of Eratosthenes method to test each prime

Challenge4
| serialize
| order by Primes asc
| extend prevA = prev(Primes,1)
| extend NextA = next(prevA,1)
| extend test =  prevA + NextA + 1
| where test % 2 != 0 // skip even numbers
| extend divider = range(3, test/2, 2) // divider candidates
| mv-apply divider to typeof(long) on
(
  summarize Dividers=countif(test % divider == 0) // count dividers
)
| where Dividers == 0 // prime numbers don’t have dividers
| where test < 100000000 and test > 99999000
| top 1 by test

//Method 2 – This query will calculate the largest prime under 100M by comparing special primes to the data set of all prime numbers

Challenge4
| serialize
| project specialPrime = prev(Primes) + Primes + 1
| join kind=inner (Challenge4) on $left.specialPrime == $right.Primes
| where specialPrime < 100000000
| top 1 by Primes desc



Now that we have our prime number we can move on to part 2
Largest special prime under 100m

The number we want is 99999517 so we can now head over to http://aka.ms/99999517

A-ha a message from our nemesis and we need to meet them in a specific area marked by certain types of trees!

Ready to play? challenge text - Part 2

Well done, my friend.
It's time to meet. Let's go for a virtual sTREEt tour...
Across the Big Apple city, there is a special place with Turkish Hazelnut and four Schubert Chokecherries within 66-meters radius area.
Go 'out' and look for me there, near the smallest American Linden tree (within the same area).
Find me and the bottom line: my key message to you.

Cheers,
El Puente.

PS: You know what to do with the following:

----------------------------------------------------------------------------------------------

.execute database script <|
// The data below is from https://data.cityofnewyork.us/Environment/2015-Street-Tree-Census-Tree-Data/uvpi-gqnh 
// The size of the tree can be derived using 'tree_dbh' (tree diameter) column.
.create-merge table nyc_trees 
       (tree_id:int, block_id:int, created_at:datetime, tree_dbh:int, stump_diam:int, 
curb_loc:string, status:string, health:string, spc_latin:string, spc_common:string, steward:string,
guards:string, sidewalk:string, user_type:string, problems:string, root_stone:string, root_grate:string,
root_other:string, trunk_wire:string, trnk_light:string, trnk_other:string, brch_light:string, brch_shoe:string,
brch_other:string, address:string, postcode:int, zip_city:string, community_board:int, borocode:int, borough:string,
cncldist:int, st_assem:int, st_senate:int, nta:string, nta_name:string, boro_ct:string, ['state']:string,
latitude:real, longitude:real, x_sp:real, y_sp:real, council_district:int, census_tract:int, ['bin']:int, bbl:long)
with (docstring = "2015 NYC Tree Census")
.ingest async into table nyc_trees ('https://kustodetectiveagency.blob.core.windows.net/el-puente/1.csv.gz')
.ingest async into table nyc_trees ('https://kustodetectiveagency.blob.core.windows.net/el-puente/2.csv.gz')
.ingest async into table nyc_trees ('https://kustodetectiveagency.blob.core.windows.net/el-puente/3.csv.gz')
// Get a virtual tour link with Latitude/Longitude coordinates
.create-or-alter function with (docstring = "Virtual tour starts here", skipvalidation = "true") VirtualTourLink(lat:real, lon:real) { 
	print Link=strcat('https://www.google.com/maps/@', lat, ',', lon, ',4a,75y,32.0h,79.0t/data=!3m7!1e1!3m5!1s-1P!2e0!5s20191101T000000!7i16384!8i8192')
}
// Decrypt message helper function. Usage: print Message=Decrypt(message, key)
.create-or-alter function with 
  (docstring = "Use this function to decrypt messages")
  Decrypt(_message:string, _key:string) { 
    let S = (_key:string) {let r = array_concat(range(48, 57, 1), range(65, 92, 1), range(97, 122, 1)); 
    toscalar(print l=r, key=to_utf8(hash_sha256(_key)) | mv-expand l to typeof(int), key to typeof(int) | order by key asc | summarize make_string(make_list(l)))};
    let cypher1 = S(tolower(_key)); let cypher2 = S(toupper(_key)); coalesce(base64_decode_tostring(translate(cypher1, cypher2, _message)), "Failure: wrong key")
}

Using the census data, we now need to figure out the location in the clue, luckily, it’s only a KQL query away

Query Hint - Part 2
Getting the right size area can be tricky, a KQL command that will be extremely helpful will be geo_point_to_h3cell

Query challenge 4 - Part 2

//This query will filter a specific size area until it makes the set of trees given in the clue

let locations =
nyc_trees
| extend h3cell = geo_point_to_h3cell(longitude, latitude, 10)
| where spc_common == “‘Schubert’ chokecherry”
| summarize count() by h3cell, spc_common
| where count_ == 4
| summarize mylist = make_list(h3cell);
let final =
nyc_trees
| extend h3cell = geo_point_to_h3cell(longitude, latitude, 10)
| where h3cell in (locations)
|where spc_common ==  “Turkish hazelnut” or spc_common == “American linden”
| summarize count() by h3cell, spc_common
| where spc_common == “Turkish hazelnut” and count_ ==1
| project h3cell;
nyc_trees
| extend h3cell = geo_point_to_h3cell(longitude, latitude, 10)
| where h3cell in (final)
| where spc_common == “American linden”
| top 1 by tree_dbh asc
| project latitude, longitude
| extend TourLink = strcat(‘https://www.google.com/maps/@’, latitude, ‘,’, longitude, ‘,4a,75y,32.0h,79.0t/data=!3m7!1e1!3m5!1s-1P!2e0!5s20191101T000000!7i16384!8i8192’)


Now that we have a location, we’re not done yet and here’s where the fun really starts, using our generated link will take us on a “Tour of the City” and give us a google maps street view link. Have a look around for our mysterious “El Puente” you may need to walk around a little bit.

Now that we’ve found the message, there’s one more thing we need to do and that’s to use the decrypt function to figure out the message from out detective portal, this part was a little tricky and took a few tries to get the right string to use.

Decryption Key

Using the mural the phrase we are looking for is “ASHES to ASHES”

There we have it, another secret message! Keep a hold of this answer as you’ll need it to complete the final challenge.

Well done Detective, we’ve been on quite the journey. See you in the next challenge!


Loading

Kusto Detective Agency: Challenge 3 – Bank robbery!

Challenges

I must admit that the difficulty spike on the challenges is both refreshing and surprising. The level of care that went into crafting each of these scenarios is outstanding and the ADX team have certainly outdone themselves, if you like these cases as much as I do you can let the team know at kustodetectives@microsoft.com

General advice

Again, this case requires some pretty heavy assumptions to solve, some of which the hints will give you clarity on. It’s very easy when trying to solve the bank robbery to end up with a very overcomplicated solution that may take you in the wrong direction, try keep this one simple.

Challenge 3: Bank robbery!

For this challenge you need to track down the hideout of a trio of bank robbers, it seems simple, you have the address of the bank and are provided with all the traffic data for the area now it’s just a case of figuring out where the robbers drove off to.

Query Hint
The trick with this challenge is you need to be able to create a set of vehicles that weren’t moving during the robbery, of course the catch is that only moving vehicles have records in the traffic data. KQL commands that will be useful for this challenge are join, remember that there are different kinds of joins and arg_max

Bonus cool tip

Thanks to my colleague Rogerio Barros for showing me this one because it is awesome! Due to the nature of the traffic data, it is actually possible to plot the route of any number of cars using | render scatterchart. Below is a visual representation of three random cars as they move about Digitown, this is quite interesting once you have identified the three suspects.

Bank robbery challenge text

We have a situation, rookie.
As you may have heard from the news, there was a bank robbery earlier today.
In short: the good old downtown bank located at 157th Ave / 148th Street has been robbed.
The police were too late to arrive and missed the gang, and now they have turned to us to help locating the gang.
No doubt the service we provided to the mayor Mrs. Gaia Budskott in past – helped landing this case on our table now.

Here is a precise order of events:

  • 08:17AM: A gang of three armed men enter a bank located at 157th Ave / 148th Street and start collecting the money from the clerks.
  • 08:31AM: After collecting a decent loot (est. 1,000,000$ in cash), they pack up and get out.
  • 08:40AM: Police arrives at the crime scene, just to find out that it is too late, and the gang is not near the bank. The city is sealed – all vehicles are checked, robbers can’t escape. Witnesses tell about a group of three men splitting into three different cars and driving away.
  • 11:10AM: After 2.5 hours of unsuccessful attempts to look around, the police decide to turn to us, so we can help in finding where the gang is hiding.

Police gave us a data set of cameras recordings of all vehicles and their movements from 08:00AM till 11:00AM. Find it below.

Let’s cut to the chase. It’s up to you to locate gang’s hiding place!
Don’t let us down!

Query challenge 3

//This query will calculate a set of cars not moving during the robbery, which then started moving after it occurred and track vehicles heading to the same address

let Cars =
Traffic
| where Street == 148 and Ave == 157
| where Timestamp > datetime(2022-10-16T08:31:00Z) and Timestamp < datetime(2022-10-16T08:40:00Z) | join kind=leftanti ( Traffic | where Timestamp >= datetime(2022-10-16T08:17:00Z) and Timestamp <= datetime(2022-10-16T08:31:00Z)
) on VIN
| summarize mylist = make_list(VIN);
Traffic
| where VIN in (Cars)
| summarize arg_max(Timestamp, *) by VIN
| summarize count(VIN) by Street, Ave
| where count_VIN == 3



Now just wait for the police to swoop in and recovery the stolen cash, another job well done detective!

Loading

Kusto Detective Agency: Challenge 2 – Election fraud in Digitown!

Challenges

These challenges are a fantastic hackathon approach to learning KQL, every week poses a new and unique approach to different KQL commands and as the weeks progress, I’ve learned some interesting tricks. Let’s take a look at challenge 2.

General advice

I’ve mentioned previously that there are hints that can be accessed from the detective UI, from this challenge onwards the hints provide critical information and without them there are assumptions you need to make, which if incorrect will throw you off the correct solution.

This is also the first challenge that has multiple mays to get to the answer, in this post i will be discussing the more interesting one.

Challenge 2: Election fraud?

The second challenge ramps up the difficulty, you’ve been asked to verify the results of the recent election for the town mascot.

Query Hint
In order to solve challenge, you need to be figure out if any of the votes are invalid and if any are, removed them from the results.
KQL commands that will be helpful are anomaly detection, particularly series_decompose_anomalies and bin, alternatively you can also make use of format_datetime and a little bit of guesswork
Election Fraud challenge text

Query challenge 2

//This query will analyze the votes for the problem candidate and look for anomalies, if any are found they will be removed from the final count give the correct results for the election!

let compromisedProxies = Votes
| where vote == “Poppy”
| summarize Count = count() by bin(Timestamp, 1h), via_ip
| summarize votesPoppy = make_list(Count), Timestamp = make_list(Timestamp) by via_ip
| extend outliers = series_decompose_anomalies(votesPoppy)
| mv-expand Timestamp, votesPoppy, outliers
| where outliers == 1
| distinct via_ip;
Votes
| where not(via_ip in (compromisedProxies) and vote == “Poppy”)
| summarize Count=count() by vote
| as hint.materialized=true T
| extend Total = toscalar(T | summarize sum(Count))
| project vote, Percentage = round(Count*100.0 / Total, 1), Count
| order by Count



Digitown can sleep easy knowing that they have their correct town mascot due to your efforts! Stay tuned for some excitement in challenge 3.

Loading