C# SHA256 hashing in 2024
Inhaltsverzeichnis
- 1 Creating a C# SHA256 hash
- 2 Some basics about „data“?
- 3 What is a hash in the first place?
- 4 Why do I need to hash „things“?
- 5 So hashing means encrypting – no?
- 6 Salts – The right seasoning?
- 7 I’ve seen file hashes as well?
- 8 Quick examples – C# SHA256 hashing
- 9 ++ .NET 5 Update – Static Algorithm Methods ++
- 10 Related posts
Creating a C# SHA256 hash
Creating a C# SHA256 hash as changed a little bit – pardon me for not knowing the exact timestamp though. At least this is the thing I’ve stumbled upon a few days ago, when I needed it in my customer project. So, in today’s post we will talk about the essential basics of hashing and how to create a SHA256 hash in the context the C# language. By the way – we won’t use any third party library here.
Some basics about „data“?
If we want to find out what hashes actually are, we first need to step back a moment, to see how data is stored. In the first example we could start with a simple string like „MyPassword123“ (no I won’t use Hello World 🤮). After seeing hundreds of bad passwords, I want to explicitely call out: „Don’t use such a bad password“ – just sayin‘.
Data basics
So when talking about „data“ in general – while being in a computational context –, we actually talk about something called „bytes“. In its singular form – a „byte“ – it’s the smallest addressable unit of memory inside a modern computer. Diving even deeper, a byte consists of 8 bits, that are basically „connected“ together. Take a look at the following illustration:
So as you can see in the image, there are 8 – more or less – independent bits, still acting together as one byte. Each one of these bits, can be in one of two states: Either on, or off. Depending on those on/off states, the byte changes it’s representing value. This is why it’s called binary – either, your computer works, or it doesn’t #clownoff 😉! Next, we are going to look at some concrete data examples.
Example data
Please understand, that I won’t go into this too deep, but I think some basic examples are required to at least have some basic knowledge. Of course you can skip this section, if this is baby stuff for you 😋! Let’s take a look at two examples now: One number, and one string.
A number – let’s say 42
Let’s now take a look at some basic number being represented by those bits inside the byte, for example „42“. After taking that sneak peek, I will explain it in more detail.
So as you can see, the binary representation (in form of a byte) of the number „42“ is basically „00101010“. As I’ve already mentioned, each one of those bytes can either be „on“, or „off“. This is represented by the corresponding numbers being „0“ for „off“ and „1“ for „on“.
With my (german biased) english wording skills, I would explain it like this:
Each one of the single digits represent a „value“ being doubled after each step. Important note: You start on the right side, with the value „1“. Then you basically look at each byte (again, from the right side) and remember the active values:
- The „1“ bit -> is off, so don’t take it into account = 0
- The „2“ bit -> it’s on, take its value = 2
- The „4“ bit is off, ignore..
- The „8“ bit is on, take it!
So, if we continue like this we will have the following active values at the end: 2, 8 and 32. Now guess what this makes up, if we sum it together – correct, 42 🎉!
A string being „MyPassword123“
After taking a look at this rather easy and basic „number only“ example, we will now go ahead and check out the mentioned string „MyPassword123“. Allthough strings being more complex, they are basically following the same rules. Being a Zelda fan since good old kiddie-times – therefore loving to open chests – I don’t like opening another one here.
I’m talking about a topic called character encodings – seriously, I will take a shortcute here. For now, just keep in mind, that we are using the basic ASCII table for „decoding“ our letters. Let’s now go ahead and start decoding our string beginning with the character „M“.
Taking a look at the ASCII table – mentioned and linked above – we can see, that the big „M“ has an ASCII value / code of „077“. You can even try typing that code manually by using your numpad on the left (if you have one). Hold the „alt“ key and type the code: „077“, after releasing „alt“ then, you should notice an appeared „M“ letter 😎.
What is M’s binary representation?
Now let’s find out how the letter „M“ is represented in a binary form – having the info it’s ASCII code is „077“. Take a look how the number 77 would be displayed in a bitwise manner:
Now we can just repeat the algorithm already explained a few steps above. Start from the right side, with value 1. Double it’s value 7 times – which get’s you to the final „128“ value. Then check whether the bit is „active“ and take its value into the sum:
- Bit 1 -> checked = 1
- Bit 3 -> checked = 4
- Bit 4 -> checked = 8
- Bit 7 -> checked = 64
Summing it up alltogether, we now have our target value, which is „77“. I will save your time repeating it for each and every other letter inside our target string 😁! If you still want to see the whole binary representation of our string, here it comes:
01001101 01111001 01010000 01100001 01110011 01110011 01110111 01101111 01110010 01100100 00110001 00110010 00110011
What is a hash in the first place?
After taking a look into the basics of storing data inside the computers memory, we can now go a bit further (pun intended). I think the most basic explanation goes like this: „You provide some value, apply some algorithm to it and get an output value“. One more thing to mention is: „The same input will result in the same output“.
Before we continue with blabla 😅, I would like to show you like 3 examples:
- ca978112ca1bbdcafac231b39a23dc4da786eff8147c4e72b9807785afee48bb
- bc7b8851671f2fda237a53f5057a0376037b6d062e65f965c62aa1d047498759
- befc3e74171cfac5f713b41512ded54cee67aa10aec174659f99ac19fb2f2974
First important point – „unknown“ source
Pretty much nonsense, huh? But well, not really, actually you know at least two of those thing. Here you can already see one of the important points of hashing. Using hashes basically hides the origin, so you don’t really know the „source“ of the hash. Further down, we will explore why / when this could be really helpful.
Fixed output length – the second point
The first hash from above is – surprisingly – just a small letter called „a“, weird – right!? The second one is our password string we used as an example from above. Especially the third one is pretty interesting, it’s actually one of the image files I’ve used in this post.
A SHA256 hash is 256 bits long – always – and it is being represented using hexadecimal notation. As the hexadecimal notation is only in the need of 4 bits for each character, you can actually have 256 / 4 = 64 characters. Therefore you could for example store it with a char(64) field inside a typical mySQL database.
Third point – it’s a one-way train
The third and another important point is, that a hash is actually irreversible, once you go forwards, you can’t go back. It might sound, or like a disadvantage, but it can be a real benefit in some situations. We will dive deeper into the „where is it good“ or „why“ in a few seconds.
Why do I need to hash „things“?
Well, after seeing all these basics and stuff, you could ask yourself: „But why is hashing used at all?“. For sure this is a really good question. There’s one special use case, which comes into my mind pretty fast.
Users, Login and stuff
The first use case I got strictly paired with hashing in my mind is something what almost every app needs: A login / authentication of a user. We all used it hundreds of times a login screen, but how does it actually work in the background?
When a user confirms his entered credentials like „robert“ as username and „MyPassword123“, it gets send to the server side of our app. Then the server app will go ahead and try to pull the matching user record for the specified username. If it has not been able to find a matching username record, well then we usually get a corresponding error.
Hashing comes into play
The interesting part happens next, when the user could actually be found using the provided username. Then the controlling part will mostly try to hash the entered password with the configured hash algorithm. Having the hashed user password in the hand, it will compare the hashed password with the password from the database – which is stored as hash as well.
If you remember that one important point where I said that „hashes are irreversible“, then this is, where the actual power is here. I think everyone has heard of that typical „breaking IT news“, that someones database got stolen. There are similar news where the headlines are like „xy thousand user datasets“ has been leaked.
Now imagine what would happen if you could just read everyones password in clear text like: „Oh yeah, Robert’s password is like ‚MyPassword123′“. Someone could try to use this password in the same application or in other important apps like PayPal. I mean there are many persons using their email as login ident, so.. And with that databases getting leaked, you have them anyways – hacky-boy jackpot.
Conclusion
So the primary aspect of hashing has its roots – in my opinion – in the context of security. The 3 core aspects of hashes are: „Being one way only“, „Having a fixed length“ and most important „The source is unknown“. That enables storing and comparing passwords, pins, etc. in a secured matter making it less critical if they get leaked.
So hashing means encrypting – no?
Well this is one of the most common misconceptions, even in more advanced developer circles. I came across wordings like „yeah, let’s encrypt it with SHA256“ in my business context more often than you might think. You should remember one thing: Hashing isn’t encrypting & decrypting.
As I already mentioned above, hashing only goes in one direction: You input something, it does its magical mathematical work and in the hand you have your output. Encryption means something different, it indicates that you are able to encode something and decode it back again.
But what is the catch – talking about collisions
If you are already in the universe of programming for some time, you might have heard of some hashing algorithm called „MD5“. Just a few days ago, when I worked for a tunesian customer, the Message-Digest Algorithm 5 was actually being used. Considering the security context, let me tell you one thing: Don’t use it in a problematic context nowadays!
When I was talking about „hashes are irreversible“, then that was actually the – more or less – full truth. But there is one thing called „collisions“ which can occur, when using bad / outdated hashing algorithms. The Message-Digest Algorithm 5 is one of those you should therefore avoid.
Salts – The right seasoning?
You might have heard about a thing called „salt / salts“ in the context of hashing algorithms. Actually it is pretty good if so, because it’s an additional thing for more security. If you think of your hash algorithm being some kind of dinner being cooked, then a salt can add more flavour to your algo.
How does it theoretically work?
Salting is „something additional“ being added to the password before being stored into the database. This will ensure, that even if two users choose the same password, the hashed output will still be different. If some user gets hacked you can’t guess like „ah, this other user has the same hash, therefore he has the same password“.
Another nice security thing about salts is adding protection against the rainbow table hacking method. A rainbow table attack is basically a hacking attempt, where you have some kind of premade / precached table of hashed values. Maybe this is a topic for another post, I think this one is already long enough 😂.
But where does the salt go?
This is a question I (any many more guys) have asked myself when I first got in contact with salting hashes. I though like: „Well, if I randomly and uniquely generate that salt on like registration, how am I supposed to ever find it back?“. Adding to this: „And how am I then to ever be able to check the password on a login attempt?“.
The answer is pretty easy: You just store the salt alongside with the password inside the database.
I’ve seen file hashes as well?
As a passionate hobby-gamer (yes, even with 30 and will be with 80) I have seen those many times as well. Those little „hey, this file has a hash of xyz“ hints, mostly located at some download pages. And here its a security aspect as well, who would’ve thought that, huh!?
When downloading files, there are some things that can go wrong or even revealing themselves as a security problem. Sure there can be less and more problematic things, but think of getting not the actual game client / rom you wanted? You basically enter your login details which then could be send to the malicious producer of this software – while you are still thinking: „Yes, I got the right game“.
Don’t trust, validate!
So there is this nice basic rule which settled in many minds when the bitcoin first came into peoples memory: „Don’t trust, validate!“. This is actually one great thing, which essentially explains, what these file hashes are all about. The official vendor can tell you like „This is the original and correct state of the software I’m providing to you“.
After downloading the software you could create the hashed version of the file yourself making sure, that you actually got the right one. This means: No corruption has happended during download and everything should be working fine. And this also means: No sneaky hacker has given you a little trojan which he could use to spy on your actitivies and credentials with. This is especially essential when downloading your software through some sort of sharing network.
Patching your favourite game
Another example is for example when you have some old ROM for your favourite nostalgia game. There could be some developer, who made a nice romhack (extending / changing the original for more fun). He would most likely need the same origin / starting point of a software, being able to patch it exactly like he needs to.
Quick examples – C# SHA256 hashing
No time for blabla? Just take a look at the different quick examples listed below. Finally I can write some code for you to create some C# SHA256 hash 🤓!
The old, outdated way
In the outdated version (where the SHA256Managed class even get’s marked by Visual Studio) we have a basic StringBuilder. Then we create an instance of the hash algorithm with the factory method called „create“. Take care for me using the newer „using“-syntax in one line = without curly braces.
Next we are getting the UTF8 bytes of the „clearText“ string variable and passing them to the „ComputeHash“ function of our hash algorithm class instance. Then we are iterating the byte array called result formatting each one into its hex representation. At the end, we are finally returning the hashed string.
public static string SHA256Outdated(string clearText) { var sb = new StringBuilder(); using var hash = SHA256Managed.Create(); var result = hash.ComputeHash(Encoding.UTF8.GetBytes(clearText)); foreach (var b in result) sb.Append($"{b:x2}"); return sb.ToString(); }
The new version
The newer version works similar to the above one, except that we are using the „create“ factory method of the „HashAlgorithm“ class. There we provide the name of the hash algorithm we want to use. I love using the „nameof“ expression, but feel free to just provide „SHA256“ as string. Next we are doing pretty much the same as above as well.
using System.Text; using System.Security.Cryptography; private static string SHA256(string clearText) { var sb = new StringBuilder(); var bytes = Encoding.UTF8.GetBytes(clearText); var algo = HashAlgorithm.Create(nameof(SHA256)); var hash = algo!.ComputeHash(bytes); foreach (var byt in hash) sb.Append(byt.ToString("x2")); return sb.ToString(); }
Hashing a file
This is an example of hashing an actual file like we have talked about further above.
public static string SHA256FileHash(string filePath) { var algo = HashAlgorithm.Create("SHA256"); using var filestream = new FileStream(filePath, FileMode.Open); // should be zero by default anyways.. // filestream.Position = 0; var hashValue = algo.ComputeHash(filestream); var hash = BitConverter.ToString(hashValue).Replace("-", ""); return hash; }
++ .NET 5 Update – Static Algorithm Methods ++
So a user just commented like „this is wrong“, well okay, what a neat description / criticism.. I thought I should write an update, with my thought process about this. I’m pretty open for any based criticism and arguments, so go ahead, if you have other suggestions.
The back story of his argument could be: In the „code analysis“ – id „CA1850“ – there is written, that you should actually use the static methods of the corresponding hash algorithm, called „HashData„. As there’s written, those are only aliases to the static methods, so you basically can avoid instantiating an algorithm each time. But who says you need to?
In my opinion I want to be able to switch out different hash algorithms as easy as possible. Further, I would do this by just injecting the right algorithm instance by dependency injection – once. Like this, I can mantain a nice, loosely coupled infrastructure and I’m not over-creating instances.
Just my opinion: So, in the end, you can decide yourself, if you want to listen to the code analysis CA1850, or if you just want to like use DI and inject an actual instance of a hash algorithm. On the other hand, if „switching out“ isn’t necessary for you – like when you always need one specific algorithm, always – you can just go with the static methods.