By the way, congrats to the PPP team for the awesome organization of the CTF and its challenges. :-)
Original description
"Top operative laser mic'd a room where a robot conspirator was logging into the robot government's secret interface. We were able to clean up the audio file significantly, but have no clue anymore."Overview
The challenge is supposed to be very straightforward, because we only have a recorded audio file of someone typing in a keyboard. Assuming that each key emits a different sound when pressed, if we have enough keystrokes, theoretically we should be able to infer the text being typed, making some assumptions (like the expected language and so on).After listening carefully to the 6 minutes of the audio file, and given the general quality of the challenges, we can safely assume that other approaches (like steganography or guessing) have to be much less probable.
A summary of the process to follow would be:
- Detect and separate each keystroke
- Group them according to their sound patterns and similarity
- Given a keystroke, classify and tag it according to its group
- Generate a cryptogram with the tagged keystrokes
- Solve the cryptogram assuming a monoalphabetic substitution cipher
Detecting the keystrokes
Instead of applying a formal method, I started writing some code and testing with some variations and I decided to follow that empirical way, keeping the process as simple as possible, so probably some part of my process is unnecessary or even have a negative impact on the final results or accuracy.Well, my first step was to clean the audio, reducing the background noise and randomness of the signal, mainly by filtering and minimizing the lowest frequencies.
Original audio signal
Filtered audio signal
Then, I saved this filtered audio as raw PCM data (44100 Hz, 16 bits, mono), instead of the compressed MP3 format of the original file, so now we have the signal directly represented as a byte array, which is a necessary step in order to perform calculations over it.
Each sound can be decomposed as a sum of frequencies, and a well-known mathematical operation to get those frequencies from its time representation is the Fourier transform. Our input is sampled data in time domain, so we will use the Discrete Fourier Transform (DFT) to transform it to the frequency domain.
We can try to detect the beginning of each keystroke based on the amplitude of the signal, and then start taking samples from these points, but I decided to start with a simpler approach, dividing uniformly the whole signal in blocks of the same size, and calculate its DFT.
Spectral view of the same audio signal
I chose a window size of 2048 samples (46 ms) because in average is big enough to fit a whole keystroke alone (without overlapping with the closest ones), so we will have to compute the DTF of 3840 blocks in total. The DTF function will output a sequence of complex numbers, but for our purposes of spectral analysis, we can simplify and take their magnitude as a valid measure, so we will use an array of integers to hold each DFT.
Once we have the DFT of each block, we can try to determine if it contains the sound of a keystroke or not by estimating the energy or power of the signal and checking if it's greater than a threshold. For instance, this can be done by summing the squares of each value.
Let's see the spectrum graph of some keystrokes to see what we have done until now:
Comparing different and similar keystrokes
Well, as we might expect, we can clearly appreciate the difference between the graphs of the presumably different keystrokes (on the left), and the similarities of the keystrokes that appear to be originated by the same key (on the right).
Grouping the keystrokes
Now we can sort all of these samples by some indicator of their similarity. For this task I tried with a very simple method, minimizing the Euclidean distance of the DFT vectors between each pair of consecutive samples. To check if that method was apparently good enough or not, I saved that sorted samples in a new audio file, getting the following:
Keystrokes sorted by minimum distance
The results are quite impressive, because the consecutive samples sound almost identically, and at the same time we can clearly see that they are also sorted by energy, so the first ones are more likely to be echoes rather than being keystrokes.
The last step involving the audio data is to group and tag all the samples. There are a lot of IA algorithms to classify a group of samples, but again, I decided to choose one of the simplest ones: scrolling the spectral view of the sorted samples manually and writing down in a paper the exact instants when the frequencies changes substantially (it could be done visually and aided by the sound itself).
Visual representation on how I divided the groups
In the transition zone between a group and the next one, sometimes we'll find some samples which aren't clearly belonging to the first or the second group, but the error rate should be small in any case, and should not affect the global recovery of the text, because we have a lot of samples compared with the number of groups.
Tagging the keystrokes
In my implementation, the algorithm detected 1993 positive samples, and then I classified them in 42 different groups (much more than the expected 26 letters of the alphabet plus the space bar, carriage return and some punctuation signs). There are many reasons to justify this fact, but let's continue with the process and they'll be explained later.In my opinion, a real keyboard won't produce so many different and distinguishable sounds, like the keys of a piano, but most keys have to sound almost identically. At this point, we can suspect that the audio file isn't a real recording, so if it's specifically generated to be solvable, there could be a chance that each sound has some direct relationship with its ASCII code or something, but I didn't found any.
In a real scenario, the process should require more complex methods and algorithms, involving accurate cepstrums, training, probabilities, Markov chains, neural networks and so on, but in our case we will be tagging each group sequentially (for instance, starting with the labels "a", "b", "c"…), and then sort them again according to the original positions of each sample. After that process, we will have a text cryptogram ready to be solved.
Solving the cryptogram
I started replacing the labeling the biggest groups, assuming that they have to be the space bar or the letter "e" (the most frequent keys in English), checking if the cryptogram is still consistent with the expected average word lengths. After that, I continued analyzing and replacing the most common or unusual word patterns, for instance, assuming that the ones of 3 letters have to be words like "the" or "are", the ones of 5 or 6 letters that matches with "robot" and "robots", or the almost unique pattern of the word "effective" (which appears 3 times in the text).Finally, after replacing most of the letters, we have a lot of readable phrases, and we can infer new conclusions. The most important conclusion is that some special keys (like shift or enter) are actually emitting multiple and different sounds when pressed and when released. We can see it clearly at the beginning of the phrases, because they are starting with an uppercase letter. By the way, although we can read most of the text, this detail will be crucial to solve the challenge, as we will see.
My recovered text
Finally, this is the raw text I managed to recover from the audio, without any manual correction. Some notes regarding this text:- The uppercase letters are the ones which have been replaced
- The lowercase are some of the original tags of the different groups (so they are quite random and have no real meaning), because their classification wasn't good enough
- The symbol "^" represents the shift key press, and "-" is the release of this key
LOGIN ROBOT ^M-Y KEYSTROKES ARE ^FAe -ThO LOUD ^H,- MAIL.F IAIL R ^mDANGERL HACAERSm -OERLORD^K-ROBOMmIL.CO.ROBO ^I-TCSEEMS THATCTVE HSMAhNS ARE GATHERINh THEIR FORCES^R^ ^G-REETINGS TmAELLER, ^A-S YOUmKNOW, OUR WORGT IS IN eERIL. ^O-UR ENEMIES, THERODOTS, BECOME MORE POWERFUL mITH EVERY PASSImGmTmY. ^B-EFORE WE ARE CAPTURED ANT ALL HOPE IS LOSTe, WE QUSTHMAKE ONE LAST STAND AGAINST THEM FOR ALL OF HUMANITY.F ^U-P UmTIL NOWh OUR ONLY EFFECTIeE TOOLS AGAINST THE ROBOTSmHAVR BmEN DRUTE FORCE AND DIAMOND WEAPONRY. ^O-BVIOUSLY DImMONDS ARE EFFECTIhE OR A NUMBER OF REASONS ^C-HARD ENO GH TO CUT THROUGH ROBOT ARMOR, STRONG ENOUGH TO MAKE EFFECTIeE RESTRAINTS, AND SEINY^Y-, HOWEVER, THE R BOTS ARE KNOKLEDGEABLE OFmTHESE TOOLS, AND HAVE BEEN QUBTE SUCCESSFUK IN RAIDING OUR DImMOND STOPAGE FACILITIESP ^F-URTHER, ALTHOUGH OUR KNIGHTS ARE BRAVm, THE ROBOTS ARE NUMEROSS, AmD VERYO DVIFFICSLT TODEFEATh ^L-UCKILY, WE HA.E A SECRET WEAPOH. ^Y-OU. ^H^ACKERS. ^I-T ICS RECENTLY COME TO OUR ATWENTION TAT OUR ROBOT FOES ARE FUGP OF BUGS. ^F-EW ARE SKILLED ENOUGHmIN THE TYPES OF WIIARDRY RESUIRED T FIND THESE BUGSW BUT WHEN THEY ARE FOUND, THESE BUGS PROVIDE AN EASY WAY TO TAKE OVER AND DESTmOY THE RO,OTS. ^F-URTHER, MANY OF THE ROBOTS HA.E HIDDEN DESTRUCTION CODm SEQUENCES^L- FIND QHE SEQUENCE, AND WEmCAN CAUSE TE ROBOT TO SELF DESTRUCTe m^W-OTH OURmDIAMOND SURPLY DmINDLINh AND OUR ARMYmOF KNIGHTS LIMITEF, OUR eLAN IS TO IAKE A HACAINh BLIT TO CATCH THE ROBOTS BY SURPRISE. ^m-E ARE CALLING ALL THE IACKImG WIIARDS ACROSS THE LAND Th WORK ThhETHER FOR YI HOURS T TAKE OUT AS MANY ROBOTS AS POSSIBLE. ^W-ITE ENOUGH HELP, mE THINK IT WILL BE POSSIBLE TO OVERCOME OUR ROBOT ENEMIES AND DESTROY THEBR EIL REIGN. ^W-E ARE C-UNTING ON YOUR HELPW GOOD LUCK. . DATE LOGOUTWell, we can see that the recovered text was not perfect (probably because I simplified excessively the process), but it's highly readable and I stopped there, although with some refinements the algorithm probably could perform better.
The original text was taken mainly from the synopsis of the PlaidCTF, and the flag seems to be one of the sentences at the beginning. My first attempt on the scoring panel would have been: "My keystrokes are FAR too loud", so I contacted the organizers explaining what I did and they told me that I was close, but it wasn't the correct flag (and revealed me the right one).
The original text
According to the organizers, this was literally the original text (and they used a script to generate the audio file). In this case, the "^" symbol represents a pause.login ^robot My keystrokes are >>WAY<< too loud ^^mail ^^^^^^^^^^^^mail -s "DANGER: HACKERS" overlord@robomail.co.robo ^It seems that the humans are gathering their forces: Greetings traveller, As you know, our world is in peril. Our enemies, the robots, become more powerful with every passing day. Before we are captured and all hope is lost, we must make one last stand against them for all of humanity. Up until now, our only effective tools against the robots have been brute force and diamond weaponry. Obviously diamonds are effective for a number of reasons (hard enough to cut through robot armor, strong enough to make effective restraints, and shiny), however, the robots are knowledgeable of these tools, and have been quite successful in raiding our diamond storage facilities. Further, although our knights are brave, the robots are numerous, and very difficult to defeat. Luckily, we have a secret weapon. You. Hackers. It has recently come to our attention that our robot foes are full of bugs. Few are skilled enough in the types of wizardry required to find these bugs, but when they are found, these bugs provide an easy way to take over and destroy the robots. Further, many of the robots have hidden destruction code sequences: find the sequence, and we can cause the robot to self destruct. With our diamond supply dwindling and our army of knights limited, our plan is to make a hacking blitz to catch the robots by surprise. We are calling all the hacking wizards across the land to work together for 48 hours to take out as many robots as possible. With enough help, we think it will be possible to overcome our robot enemies and destroy their evil reign. We are counting on your help, good luck. . ^^^^^^date ^^^logout
Final remarks
When I saw the angle brackets on the flag, I thought that they were emphasizing the wrong word in my expected flag, but no, they weren't… these symbols are actually part of the flag!I can imagine when somebody thought something like: "Hey, this challenge is very easy, so let's give it 200 points only, and to make it harder, let's use a flag with short words and strange symbols in the middle" xDD
Initially, I argued them that it was impossible to determine that symbols of the flag, as they weren't used in any other part of the text, but they replied to me that the angle brackets, in the American keyboard layouts, are the same keys (and sounds) of comma and dot, combined with the shift key. So yes, they're right and I completely agree, but I didn't noticed that, mainly because in my Spanish keyboard (and many others) these symbols are sharing the same key, so they are in a different location, and also, in the text I recovered, the process missed and confused some keystrokes.