Correct. This is a brute force dictionary. It’s a very powerful tool, but it’s applications are severely limited. Any well designed system has protection from brute force attacks. It’s mostly useful for stuff like cracking encrypted databases, which would be a situation where the target is entirely under your control. You can’t just break into someone’s Gmail with it.
How would you even crack an encrypted database? I guess the hacker somehow stole it from the server and has it in their dump of other databases they’re trying to crack? I don’t do hacking, I’m just curious with how it works.
Hash is if you want someone to be able to check if he’s got the right password but not able to know what it actually says.
Imagine my password is “shark”. Let’s say I use a hash algorithm so that it becomes “2gtth5”. If I log in. I enter my password. My browser* uses the same algorithm, so the text I entered is “2gtth5” now. The server looks up my hashed password, checks if it’s the same and then it lets me log in. The benefit is, the server doesn’t know my actual password, it only knows that the hash is “2gtth5”. This means if the database gets compromised, people only see “2gtth5” but not my actual password. And because it’s a hash, they don’t know how to get back from “2gtth5” to “shark” and therefore my password is not compromised.
Now imagine if I knew the hashing algorithm used and I have a list of possible passwords. There might be “shark” in there. So I can take the password list, make a hash out of every password and see if it matches. Because my password is in there, the hash for “shark” will match the hash “2gtth5” in the compromised database and they now know my actual password. This is a far bigger problem.
Everytime you see that someone “hacked” a database and password hashes got compromised, this is what happens. They use rock you and a few other lists to see if they can “crack” the hashes (this just means checking the hashes and seeing if one of the password from the list matches).
This is specifically what those lists are for. They are used by bad actors to make use of the hashed passwords they stole.
Glossary:
hash: representation of some text
cracking a hash: trying to get the actual text from a hash
salted hash: a hash with fake characters in there
algorithm: basically the way your program works, either the code or a scientific representation of the way it works
*Someone in the comments corrected me on this. The server does the hashing not the browser.
My browser uses the same algorithm, so the text I entered is “2gtth5” now. The server looks up my hashed password
This is not correct. Your browser will submit “shark” and then the backend server will do whatever hashing is required and after that it will compare the hashes. If hashing was happening in the browser that would mean that an attacker would be be able to attack by using just the hashes of the passwords, not the passwords themselves. Also in such case, the browser would had been responsible to do the required salting which in turn would make it pointless as it would had been known.
The browser could do the hashing, but then the frontend would need the same salt, which is a huge liability. Some apps obfuscate it by encrypting with a nonce or something, but all that does is delay an attack.
Standard practice is indeed on the server with a limited number of attempts on the same account in a time window to prevent brute force attacks.
The DB don’t even store passwords, but a hash of a user’s password. When someone logs in, their password is hashed and compared to what’s stored in the DB. If they match, entry is granted.
Passwords stored as one-way hashes are cracked by generating passwords and running them against the same hash algorithm, like sha256, sha-1 or md5 if you’re especially shitty at protecting information. Same hash = same password in most cases. The cracking is done using GPUs because they accelerate at those types of functions. This doesn’t even consider salted hashes which make the process more difficult for an attacker.
You do this locally so that you don’t lock a username out or trip alerts or become noticed by someone until you’re ready to gain access.
By guessing the correct password, which is where this brute force dictionary comes in. A database, or other encrypted file, has no means of preventing repeat guesses, so you can take as many bites at the apple as you want. With high end GPU clusters you can attempt thousands of guesses per second. If you restrict your guesses to likely answers only (which is the point of the password list) you can break through in a pretty reasonable amount of time, assuming a vaguely common password was used. Of course, if the database or file is encrypted with something like a random and sufficiently long alphanumeric password or similar, that’s a whole different story, and your odds of getting in go down significantly.
There are other attacks of course, but those get significantly more complicated and rely on there being some sort of flaw in the encryption scheme to exploit, or you managing to find the password by some other means (sniff it out of memory while the system is live, social engineering, etc).
I would assume that because the original rockyou list was always just used for dictionary brute force attacks, so no associated usernames.
Correct. This is a brute force dictionary. It’s a very powerful tool, but it’s applications are severely limited. Any well designed system has protection from brute force attacks. It’s mostly useful for stuff like cracking encrypted databases, which would be a situation where the target is entirely under your control. You can’t just break into someone’s Gmail with it.
How would you even crack an encrypted database? I guess the hacker somehow stole it from the server and has it in their dump of other databases they’re trying to crack? I don’t do hacking, I’m just curious with how it works.
That has to do with how hashes work.
Hash is if you want someone to be able to check if he’s got the right password but not able to know what it actually says.
Imagine my password is “shark”. Let’s say I use a hash algorithm so that it becomes “2gtth5”. If I log in. I enter my password. My browser* uses the same algorithm, so the text I entered is “2gtth5” now. The server looks up my hashed password, checks if it’s the same and then it lets me log in. The benefit is, the server doesn’t know my actual password, it only knows that the hash is “2gtth5”. This means if the database gets compromised, people only see “2gtth5” but not my actual password. And because it’s a hash, they don’t know how to get back from “2gtth5” to “shark” and therefore my password is not compromised.
Now imagine if I knew the hashing algorithm used and I have a list of possible passwords. There might be “shark” in there. So I can take the password list, make a hash out of every password and see if it matches. Because my password is in there, the hash for “shark” will match the hash “2gtth5” in the compromised database and they now know my actual password. This is a far bigger problem.
Everytime you see that someone “hacked” a database and password hashes got compromised, this is what happens. They use rock you and a few other lists to see if they can “crack” the hashes (this just means checking the hashes and seeing if one of the password from the list matches).
This is specifically what those lists are for. They are used by bad actors to make use of the hashed passwords they stole.
Glossary:
*Someone in the comments corrected me on this. The server does the hashing not the browser.
This is not correct. Your browser will submit “shark” and then the backend server will do whatever hashing is required and after that it will compare the hashes. If hashing was happening in the browser that would mean that an attacker would be be able to attack by using just the hashes of the passwords, not the passwords themselves. Also in such case, the browser would had been responsible to do the required salting which in turn would make it pointless as it would had been known.
Ah that makes sense let me put an asterisk on that then
The browser could do the hashing, but then the frontend would need the same salt, which is a huge liability. Some apps obfuscate it by encrypting with a nonce or something, but all that does is delay an attack.
Standard practice is indeed on the server with a limited number of attempts on the same account in a time window to prevent brute force attacks.
The databases aren’t encrypted exactly…
The DB don’t even store passwords, but a hash of a user’s password. When someone logs in, their password is hashed and compared to what’s stored in the DB. If they match, entry is granted.
Passwords stored as one-way hashes are cracked by generating passwords and running them against the same hash algorithm, like sha256, sha-1 or md5 if you’re especially shitty at protecting information. Same hash = same password in most cases. The cracking is done using GPUs because they accelerate at those types of functions. This doesn’t even consider salted hashes which make the process more difficult for an attacker.
You do this locally so that you don’t lock a username out or trip alerts or become noticed by someone until you’re ready to gain access.
By guessing the correct password, which is where this brute force dictionary comes in. A database, or other encrypted file, has no means of preventing repeat guesses, so you can take as many bites at the apple as you want. With high end GPU clusters you can attempt thousands of guesses per second. If you restrict your guesses to likely answers only (which is the point of the password list) you can break through in a pretty reasonable amount of time, assuming a vaguely common password was used. Of course, if the database or file is encrypted with something like a random and sufficiently long alphanumeric password or similar, that’s a whole different story, and your odds of getting in go down significantly.
There are other attacks of course, but those get significantly more complicated and rely on there being some sort of flaw in the encryption scheme to exploit, or you managing to find the password by some other means (sniff it out of memory while the system is live, social engineering, etc).