Password cracking: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
(9 intermediate revisions by the same user not shown) | |||
Line 10: | Line 10: | ||
# If no salt is used (e.g. plain MD5), consult a pre-computed lookup table. | # If no salt is used (e.g. plain MD5), consult a pre-computed lookup table. | ||
# If small salt is used (less than say 4 bits), consult a pre-computed lookup table. | |||
# If a sensible algorithm is used (e.g. bcrypt with large salt), check dictionary, then common words, then variants of the previous two, then brute force. | |||
== Identifying an algorithm == | == Identifying an algorithm == | ||
Line 21: | Line 23: | ||
* What options does the chosen language support? | * What options does the chosen language support? | ||
* Which options are the most efficient? | * Which options are the most efficient? | ||
* Can lookup tables be built entirely in memory and then flushed to disk? Regular flushing as used by SQLite prevents data loss but may take longer due to regular I/O. | * Can lookup tables be built entirely in memory and then flushed to disk? Regular flushing as used by SQLite prevents data loss but may take longer due to regular I/O. (answer: Yes, just put the whole thing in a huge transaction and commit at the end). | ||
* Trade-off between size of table (and time to generate) and coverage. May not be worthwhile building lookup tables for anything more than dictionary words and common passwords. | |||
* For very fast hashing such as MD5, it may be quicker to just brute-force everything. | |||
Possible contents of lookup tables: | |||
* Dictionary words | * Dictionary words | ||
Line 30: | Line 34: | ||
* Every possible combination of case and 0-9a-z from 6-12 characters in length. | * Every possible combination of case and 0-9a-z from 6-12 characters in length. | ||
== | == Performance == | ||
Vectorisation vs concurrency vs parallelisation. | |||
== Languages == | |||
Language choice is a combination of performance, available libraries and existing knowledge. Obvious initial candidates are: | |||
* C | |||
* CPython (reference implementation of Python) | |||
* PyPy (Python written in Python - supposedly faster than CPython but sometimes behind in terms of version support) | |||
== Libraries == | == Libraries == | ||
Ultimately most libraries end up being a wrapper around [[OpenSSL]]. | Ultimately most crypto libraries end up being a wrapper around [[OpenSSL]]. | ||
=== Python === | === Python === | ||
[https://docs.python.org/3/library/hashlib.html hashlib] is the Python wrapper around OpenSSL and appears to be in the standard library. | * [https://docs.python.org/3/library/hashlib.html hashlib] is the Python wrapper around OpenSSL and appears to be in the standard library. | ||
* [http://lmdb.readthedocs.io/en/release/ Python bindings to LMDB] | |||
[[Category:Python]] |
Latest revision as of 13:50, 26 August 2018
Initial steps
Steps required for password cracking software:
- Identify which columns contain the username and the password (hashed or otherwise). May be easier to convert to a standard internal representation before processing.
- Identify the algorithm used.
- Identify whether a salt is used.
From these there are multiple stages:
- If no salt is used (e.g. plain MD5), consult a pre-computed lookup table.
- If small salt is used (less than say 4 bits), consult a pre-computed lookup table.
- If a sensible algorithm is used (e.g. bcrypt with large salt), check dictionary, then common words, then variants of the previous two, then brute force.
Identifying an algorithm
- Length: 32 characters (16 bytes) is likely to be MD5.
- Characters: 0-9a-fA-F is likely to be MD5.
Lookup tables
- How should these be delivered? Plain text file, SQLite database, Lightning Memory-Mapped Database (LMDB), something else?
- What options does the chosen language support?
- Which options are the most efficient?
- Can lookup tables be built entirely in memory and then flushed to disk? Regular flushing as used by SQLite prevents data loss but may take longer due to regular I/O. (answer: Yes, just put the whole thing in a huge transaction and commit at the end).
- Trade-off between size of table (and time to generate) and coverage. May not be worthwhile building lookup tables for anything more than dictionary words and common passwords.
- For very fast hashing such as MD5, it may be quicker to just brute-force everything.
Possible contents of lookup tables:
- Dictionary words
- Common words not in dictionary (e.g. TV shows)
- Simple combinations, such as dictionary word concatenated with '1', '123' etc.
- Every possible combination of case and 0-9a-z from 6-12 characters in length.
Performance
Vectorisation vs concurrency vs parallelisation.
Languages
Language choice is a combination of performance, available libraries and existing knowledge. Obvious initial candidates are:
- C
- CPython (reference implementation of Python)
- PyPy (Python written in Python - supposedly faster than CPython but sometimes behind in terms of version support)
Libraries
Ultimately most crypto libraries end up being a wrapper around OpenSSL.
Python
- hashlib is the Python wrapper around OpenSSL and appears to be in the standard library.
- Python bindings to LMDB