Password cracking: Difference between revisions

Latest revision as of 13:50, 26 August 2018

Initial steps

Steps required for password cracking software:

Identify which columns contain the username and the password (hashed or otherwise). May be easier to convert to a standard internal representation before processing.
Identify the algorithm used.
Identify whether a salt is used.

From these there are multiple stages:

If no salt is used (e.g. plain MD5), consult a pre-computed lookup table.
If small salt is used (less than say 4 bits), consult a pre-computed lookup table.
If a sensible algorithm is used (e.g. bcrypt with large salt), check dictionary, then common words, then variants of the previous two, then brute force.

Identifying an algorithm

Length: 32 characters (16 bytes) is likely to be MD5.
Characters: 0-9a-fA-F is likely to be MD5.

Lookup tables

How should these be delivered? Plain text file, SQLite database, Lightning Memory-Mapped Database (LMDB), something else?
What options does the chosen language support?
Which options are the most efficient?
Can lookup tables be built entirely in memory and then flushed to disk? Regular flushing as used by SQLite prevents data loss but may take longer due to regular I/O. (answer: Yes, just put the whole thing in a huge transaction and commit at the end).
Trade-off between size of table (and time to generate) and coverage. May not be worthwhile building lookup tables for anything more than dictionary words and common passwords.
For very fast hashing such as MD5, it may be quicker to just brute-force everything.

Possible contents of lookup tables:

Dictionary words
Common words not in dictionary (e.g. TV shows)
Simple combinations, such as dictionary word concatenated with '1', '123' etc.
Every possible combination of case and 0-9a-z from 6-12 characters in length.

Performance

Vectorisation vs concurrency vs parallelisation.

Languages

Language choice is a combination of performance, available libraries and existing knowledge. Obvious initial candidates are:

C
CPython (reference implementation of Python)
PyPy (Python written in Python - supposedly faster than CPython but sometimes behind in terms of version support)

Libraries

Ultimately most crypto libraries end up being a wrapper around OpenSSL.

Python

hashlib is the Python wrapper around OpenSSL and appears to be in the standard library.
Python bindings to LMDB

@@ Line 1: / Line 1: @@
+== Initial steps ==
 Steps required for password cracking software:
+# Identify which columns contain the username and the password (hashed or otherwise). May be easier to convert to a standard internal representation before processing.
 # Identify the algorithm used.
 # Identify whether a salt is used.
@@ Line 7: / Line 10: @@
 # If no salt is used (e.g. plain MD5), consult a pre-computed lookup table.
+# If small salt is used (less than say 4 bits), consult a pre-computed lookup table.
+# If a sensible algorithm is used (e.g. bcrypt with large salt), check dictionary, then common words, then variants of the previous two, then brute force.
+== Identifying an algorithm ==
+* Length: 32 characters (16 bytes) is likely to be MD5.
+* Characters: 0-9a-fA-F is likely to be MD5.
+== Lookup tables ==
+* How should these be delivered? Plain text file, SQLite database, Lightning Memory-Mapped Database (LMDB), something else?
+* What options does the chosen language support?
+* Which options are the most efficient?
+* Can lookup tables be built entirely in memory and then flushed to disk? Regular flushing as used by SQLite prevents data loss but may take longer due to regular I/O. (answer: Yes, just put the whole thing in a huge transaction and commit at the end).
+* Trade-off between size of table (and time to generate) and coverage. May not be worthwhile building lookup tables for anything more than dictionary words and common passwords.
+* For very fast hashing such as MD5, it may be quicker to just brute-force everything.
+Possible contents of lookup tables:
+* Dictionary words
+* Common words not in dictionary (e.g. TV shows)
+* Simple combinations, such as dictionary word concatenated with '1', '123' etc.
+* Every possible combination of case and 0-9a-z from 6-12 characters in length.
+== Performance ==
+Vectorisation vs concurrency vs parallelisation.
+== Languages ==
+Language choice is a combination of performance, available libraries and existing knowledge. Obvious initial candidates are:
+* C
+* CPython (reference implementation of Python)
+* PyPy (Python written in Python - supposedly faster than CPython but sometimes behind in terms of version support)
+== Libraries ==
+Ultimately most crypto libraries end up being a wrapper around [[OpenSSL]].
+=== Python ===
+* [https://docs.python.org/3/library/hashlib.html hashlib] is the Python wrapper around OpenSSL and appears to be in the standard library.
+* [http://lmdb.readthedocs.io/en/release/ Python bindings to LMDB]
+[[Category:Python]]

Password cracking: Difference between revisions

Latest revision as of 13:50, 26 August 2018

Contents

Initial steps

Identifying an algorithm

Lookup tables

Performance

Languages

Libraries

Python

Navigation menu

Password cracking: Difference between revisions

Latest revision as of 13:50, 26 August 2018

Initial steps

Identifying an algorithm

Lookup tables

Performance

Languages

Libraries

Python

Navigation menu

Search