Playing with In-Memory Data Structure of JavaScript: Introduction to UTF, Typed Arrays and Blob

JavaScript is not a low-level programming language and it offers very little to interact with the host machine. But the latest version of JavaScript and the Web APIs give us the ability to play with raw-binary data at runtime. In this article, we will learn about Text Encoding, Endianness, Typed Arrays, and Blog.

I am not a computer science student and never was. Hence some of the low-level stuff still confuses me but I will try my best to explain some of the key concepts about memory.

Memory Representation

In a typical computer, we have two kinds of memory storage. A ROM (read-only memory) represents persistent data storage like a hard drive or flash drive that stores the information permanently until we desire to delete it.

In contrast, RAM (random-access memory) is a volatile memory which is used by a program to store some data while that program is running. Since RAM will be accessed by a program to store, access and manipulate data at a blazingly fast speed, it is generally a solid-state drive.

Once the program is terminated, any data stored by the program in the RAM will be wiped. Hence, when you shut down your computer, your RAM will be emptied as data inside it no longer serves any purpose. Hence, we call it a volatile memory (but not because it evaporates).

A physical memory (like RAM drive) contains a sequence of memory cells. A memory cell’s job is to store one bit of information which is either 0 or 1. Depending on the type of RAM, a memory cell contains a transistor (SRAM) and an optional capacitor to hold an electric charge (DRAM).

1-bit memory is not very useful to store anything meaningful data. Hence, generally, we store information in a number of bytes. The byte is a unit to represent a block of 8 bits. Hence a byte is also called an octet. A 32-bit value can also be called a 4-byte value and a 64-bit value can also be represented into 4 blocks of 8 bytes.

Since a Bit can either be 0 or 1, you might have guessed that a chunk of memory can be represented in a Binary system. For example, 1-byte memory holds 8 bits and some of these bits can be either 0 or 1. From this chunk of memory, we can derive a decimal number (base10).

For example, a binary number 0110 1101 in decimal number system will be 0*2⁷ + 1*2⁶+1*2⁵+0*2⁴+1*2³+1*2²+0*2¹+1*2⁰ which will account to 109. In base16 (hexadecimal), that would be 6D (6*16¹+13).

Similarly, if you have a larger chunk of memory, like a 32-bit or 64-bit memory, it can be also be translated to a decimal number or hexadecimal number. The reverse is also possible. You can convert a decimal number to binary number or hexadecimal number.

Signed and UnSigned integers

Let’s consider 1-byte of memory. The smallest value of 1-byte memory will be 0 and 0000 0000 in Binary or 00 in Hexadecimal. The largest value of 1-byte memory will be 255 and 1111 1111 in Binary or FF in Hexadecimal.

Hence we can say that 1-byte can represent 256 distinct values. 1-byte memory is sufficient enough to store any value between 0 and 255. Then what about negative numbers? Since 1-byte can store 256 numbers, we can store a negative number between 0 to -255.

An unsigned integer is an integer without a positive or negative sign. An unsigned integer always has a positive value. Hence a 1-byte memory can be used to represent an unsigned integer between values 0 and 255.

In contrast, a signed integer can have a positive or negative sign. Since we have 8 bits in a byte, we can use the left-most bit to store the sign of the number while the other 7 bits can be used to store the value. If the left-most bit is 1, then it is a positive value else it is a negative value.

Since we have only 7 bits to store actual data, the maximum value we can store is 127. So if our byte in the binary representation looks like 1111 1111, then it represents -127 value in the decimal number system.

However, 0000 0000 is +0 while 1000 0000 is -0 but 0 can not have a sign and its meaningless. Hence, we simply use a representation of -0 to store -128. This way, a signed 8-bit integer can hold values between -128 to 128.

Handling negative numbers in the binary number system is not as easy as it sounds. Computers mostly use two’s complement to represent a signed integer. Using left-most bit to store the sign might help us get the actual value of the integer in the decimal system, but it makes arithmetic calculations between different binary numbers unpredictable.

💡 You can read about Two’s complement from this Wiki article. If you need to understand in details how Two’s complement is calculated and implemented in actual calculations, this video brilliantly explains it.

In the end, 2’s complement is just calculating 1’s complement and adding 1 to the resulting binary number. And 1’s complement is generated by flipping all the bits of a number such that 1 becomes 0 and 0 becomes 1.

So in general, when we speak about an N-bit unsigned integer, it can hold a value between 0 and (the maximum value of an N-bit binary number minus 1). While an M-bit signed integer can hold a value between (negative of the maximum value of an (M-1)-bit binary number) and (the maximum value of (M-1)-bit binary number).

Binary file formats

A file is a collection of some meaningful data. For example, a file contains some text while an image file contains image data.

Depending on the extension of the file, a specific program can read a file correctly. If you open a .png file in a text editor, it won’t display the image and only image viewer applications can display that image correctly.

Every file is made up of bits or blocks of bytes. A text file may contain a block of bytes dedicated to storing each character of the text while an image file may contain blocks of bytes to represent each pixel of the image.

Just by changing the extension of a file, we can not convert a file from one format to the other (because it doesn’t change its internal data). An extension is useful to provide a little information about the type of file beforehand so that an appropriate program might read it automatically.

A file may contain other data that is not visible to the user. For example, an image file may contain additional information about the file name, image size, file size, the camera used to capture the photo, etc.

This additional information is also part of a file. This data is generally called metadata. Hence when we look at the binary representation of a file (using a hex viewer), you might be surprised to find additional data.

MIME (Multipurpose Internet Mail Extensions) Types

The MIME Type is a key used to identify a type of document. A file could have .png or .txt extension which is used by an OS or a program to identify the type of document and take appropriate action.

However, when a file or data does not have an extension like in case of an HTTP response body or buffer in memory, MIME Types are very useful to identify what kind of data we are dealing with.

When a web browser receives a response from the server, it looks at Content-Type header which could specify text/html MIME Type. Using this information, browser will render an HTML document. If the MIME Type is image/png, the browser will display the image or download it.

Here is a complete list of available MIME Types.

Character Encoding

When we think about a text file, we think only about characters such as letters, numbers and other signs. But as we know, a file can contain some metadata, the file could carry some additional payload (bits) for it.

But strictly speaking, a plain text file has no metadata. Hence when we open a file in hex-viewer (a program that shows the content of a file in Hexadecimal numbers) and see individual bytes, we should see only the blocks that represent each character of the text content.

The encoding by definition is a way to convert data from one format to the other. Hence character encoding is a way to convert a character to a binary number. The binary number representation of a character can be converted to a decimal or hexadecimal number.

ASCII Encoding

In the early era of digital computing, ASCII came up with a brilliant idea to store characters of the English language in a computer’s memory by assigning a decimal value to each character and converting that to the binary number.

Since commonly used characters were only a few, ASCII used 7-bit block of memory to store 128 characters. This is the list of ASCII characters and their decimal, binary and hexadecimal values.

In the decimal number system, uppercase letters start from A which has value 65 and ends with Z which has the value of 90. Then lowercase letters start from a which has value 97 and ends with z which has value 122.

A => [0100 0001] => (65 / 0x41)
Z => [0101 1010] => (90 / 0x5A)
a => [0110 0001] => (91 / 0x61)
z => [0111 1010] => (122 / 0x7A)

So if we have to store a plain text file in ASCII encoding, each character of the text would take 7 bits. However, each character will be stored in a byte with the leading bit set to 0 for easy storage and memory management.

The left-most bit is basically a dead bit because it has no purpose.

But since computing was a worldwide phenomenon and many non-English speaking countries were adopting digital computers, there wasn’t a uniform way to exchange information. UTF (Unicode Transformation Format) is a standard of globally accepted encoding schemes for information interchange.

UTF provides the standard for encoding text in UTF-8, UTF-16 and UTF-32 encoding formats. Let’s talk about UTF-8 and rest will be easy to understand.

UTF-8 Encoding

Unlike ASCII, UTF-8 encoding use 8 bits or 1 byte to store a character. But since only 256 characters can be stored in 1 byte, UTF-8 uses maximum 4 bytes if a character needs more space to store its value.

UTF-8 is ASCII compatible. Since a character in ASCII encoding takes up 8 bits with the leading bit set to 0, UTF-8 does the same. In UTF-8 encoding, all ASCII compatible characters are saved in the ASCII format. Hence an ASCII reader program can parse UTF-8 encoded document with no problem.

However, if a character is beyond the ASCII range (like a Chinese mandarin character), it will take more than 1-byte space. In that case, UTF-8 sets the leading bit to 1 to identify the non-ASCII character.

You should also learn about extended-ASCII encoding which utilises to the left-most bit (dead bit) to add another 128 characters to the set. But since this bit will be set to 1, the text file is no longer UTF-8 compatible. Read more.

A code-point is a decimal value given to each character. For ASCII compatible character, the code-point is the value of the byte. Hence code-point of character A is 65. UTF has given each character ever in existence a fixed value and we identify the character in any encoding formats using this code-point.

A character in UTF-8 encoding can also take more than 1-byte. The number of bytes a character can take depends on the code-point and UTF-8 encoding logic. Each byte (block) is called a code-unit.

Let’s take an example of character å. Unicode has assigned the code point value of 229 for this character. In hexadecimal, that would be E5. This character obviously can’t be stored in a single byte so it takes 2 code-units (2 bytes). In UTF-8 binary representation, it will look like 11000011 10100101.

This Wiki explains UTF-8 character encoding in great details but I would recommend you to check out Tom Scott’s video on Computerphile.

UTF-16 and UTF-32 Encoding

UTF-16 or UTF-32 follows the same principles of UTF-8. However, UTF-16 encoding uses 16 bits or 2 bytes for a code-unit and a character can take 1 or 2 code-units depending on the value of its code-point.

UTF-32 encoding uses 32 bits or 4 bytes for a code-unit and a character can be represented in a single code-unit. This is fixed-length encoding since a character has a fixed number of code-units (AKA fixed length) which is 1.

Since both UTF-16 and UTF-32 uses more than 1-byte to store a character, they are not compatible with ASCII.

As of today, UTF supports 1,112,064 characters.

Charset vs Character Encoding

As we learned, a code-point is the integer value associated with a character. Depending on the encoding, a code-point can change. For example, value of the letter A is 65 in ASCII as well as in UTF-8, UTF-16, and UTF-32.

A character set is a table that maps characters with their unique numbers. Encoding is how a character will be represented in binary numbers.

So for example, in ASCII, the value of A will be stored in 7-bits. If an ASCII reader reads a value of 65, it interprets that as character A.

In the case of UTF, character set for UTF-8, UTF-16 and UTF-32 encoding is the same. That means, in each encoding, 65 means the character A. However, each encoding will read the data differently.

For UTF-8, it will consider blocks of 8-bits to construct a character. Similarly, UTF-16 will consider blocks of 16-bits or 2-bytes to construct a character.

Looking inside a Plain Text file

Let’s create a plain text file that contains “Hello World” text in it. You should use nano or vim editor to create this file as other rich-text editors might add some metadata in the file.

If we open this file in any online Hex Editor, you can see individual code-units of the characters. Since we are dealing with ASCII characters, each character is represented by a single code-unit.

If we look at this ASCII table and compare each byte on the left, we can relate it with each character in the string on the right. We have 11 characters in the string and our file contains 11 bytes.

However, the space character is shown as a period for some reason.

What will happen if we changed the character e with å which can not be represented in ASCII? Let’s modify the file and check again.

Since character å takes 2 bytes (2 code-units) as seen before, it replaced character e (code-units: 0x65) with character å (code-units: 0xC3 OxA5). Now we have 12 bytes of data in our file.

Since this Hex Editor displays character for each byte and neither C3 (decimal: 195) nor A5 (decimal: 195) belongs to ASCII character set, it simply printed period character to signify unprintable ASCII character.

However, a normal text editor will read one byte at a time and check if a group of bytes can make up a character. If you have read UTF-8 documentation, then you already know how it works.

Typed Arrays in JavaScript

JavaScript is not a statically typed language. A variable in JavaScript can hold any type of data at runtime. The same goes for Objects and Array. An array indexed is a collection of data and data can be of any data types.

However, the ES2015 version of JavaScript provides a few types of Array that can hold specific types of data. These are called Typed Arrays.

Each element of a typed array is an integer number or a floating-point number. Depending on the type of a Typed Array, the value of a number must be between a specific range. For example, a 1-byte or 8-bits unsigned integer can have a value between 0 and 255.

In JavaScript, a typed array can hold signed and unsigned integers of 8-bits, 16-bits, 32-bits and 64-bits. Below is the list of some Typed Array constructors and their value range (source: MDN).

Int8Array      (8-bits or 1-byte)    -128 to 127
Uint8Array     (8-bits or 1-byte)     0 to 255
Int16Array     (16-bits or 2-byte)   -32768 to 32767
Uint16Array    (16-bits or 2-byte)    0 to 65535
Int32Array     (32-bits or 4-byte)   -2147483648 to 2147483647
Uint32Array    (32-bits or 4-byte)    0 to 4294967295
BigInt64Array  (64-bits or 8-byte)   -2^63 to -1 + 2^63
BigUint64Array (64-bits or 8-byte)    0 to 2^64-1
Float32Array   (32-bits or 4-byte)    1.2x10-38 to 3.4x1038
Float64Array   (64-bits or 8-byte)    5.0x10-324 to 1.8x10308

However, Typed Arrays are primarily used to represent a chunk of memory in an array of bytes. For example, a 4-bytes or 32-bits chunk of memory can be divided into Int8Array or Uint8Array of length 4.

Instantiating empty Typed Array was not possible before but as of ES2017 specification, we can create an empty Typed Array using new keyword.

ArrayBuffer

JavaScript provides ArrayBuffer constructor to allocate a fixed-length binary-data buffer in memory. This constructor accepts the number of bytes to allocate in memory.

var buffer = new ArrayBuffer(4); // 4-bytes buffer

byteLength property on an ArrayBuffer object returns the number of bytes it contains. Hence, buffer.byteLength should return 4.

From the above example, we have a 4-bytes empty region of memory. Initially, all bits are set to 0 state. Hence, the memory looks like below.

[ 00000000  00000000  00000000  00000000 ]

We can not directly modify the memory content of an ArrayBuffer object. For that, we need to use a Typed Array. Typed Array represents the underlying ArrayBuffer in an array format of typed numbers.

Since TypedArray does not hold any memory of its own, we can manipulate the elements of a Typed Array and it will modify underlying ArrayBuffer accordingly. In nutshell, Typed Array is an interface for the ArrayBuffer.

Let’s create a Uint8Array object from our buffer. This object will point to buffer and translate each byte into an unsigned 8-bit integer.

var bufferUint8 = new Uint8Array(buffer);

⪢ Uint8Array(4) [0, 0, 0, 0]

Since our buffer is 4-bytes long and each byte is basically empty, bufferUint8 is an array of length 4. We can check how many bytes a single element of a Typed Array represents using BYTES_PER_ELEMENT property.

bufferUint8.BYTES_PER_ELEMENT // prototype-property
⪢ 1

Each element of the bufferUint8 is a number converted from each byte of buffer. Since each byte of buffer is empty, it will be converted to 0 and we can see that in the result.

We can create a Typed Array representation of a buffer with an offset and length. Which means, we can look at a specific portion of a buffer and manipulate it without accidentally affecting other memory. Head on to MDN documentation to create a Typed Array with offset and length.

Let’s modify the content of the bufferUint8 and see how how it affect the buffer. We can assign a value to an element of bufferUint8 using an index.

bufferUint8[1] = 65
⪢ Uint8Array(4) [0, 65, 0, 0]

In the above example, we have assigned the 2nd element of the bufferUint8 with the value of 65. We can verify the modification in the result. However, in binary representation, buffer will now look like this.

[ 00000000  1000001  00000000  00000000 ]

If we assign out of range value to an element of a Typed Array, JavaScript will safely assign a value without throwing an error. For example, if we assign 256 to an element of Uint8Array, JavaScript will set the value of 0 and if we assign a negative value, it will assign 256 - value instead, unless it results in a negative number in which case it will also set the value of 0.

Let’s create an unsigned 16-bit integer representation of the buffer. For that, we will use Uint16Array constructor.

var bufferUint16 = new Uint16Array(buffer);

⪢ Uint16Array(2) [16640, 0]

The bufferUint16 is an array of 2 elements. This should be obvious by now as BYTES_PER_ELEMENT of a Uint16Array Typed Array is 2. Unlike the Uint8Array in the previous example, we have some initial value, because the buffer was not empty this time and it contains some values.

However, the result seems pretty weird. Before we had [0 65 0 0] values in the buffer (when viewed as an 8-bit array) and in 16-bit representation, it should be [65 0]. Then what happened?

Endianness

This is the problem with the Endianness of a computer system. Our buffer contains the below binary representation at the moment.

[ 00000000  1000001  00000000  00000000 ]

When we view this memory buffer as Uint8Array, it gives [0 65 0 0] by converting each byte to an unsigned integer. When we try to view the same memory buffer in Uint16Array, it considers chunks of 2 -bytes and converts them to unsigned integers.

[ 00000000  1000001  00000000  00000000 ]
  -----------------  ------------------
        INT 1              INT 2

However, conversion from a series of bytes to integer is not consistent across all computer architectures. A big-endian system, the most significant byte is stored at the first index and the least significant byte is stored at the end.

Hence, the CPU reads bytes from the most significant byte to the least significant byte to construct a binary number.

00000000      1000001    => 000000001000001 => 65
--------  ->  -------
   MSB          LSB

However, a little-endian system, the least significant byte is stored at first index and the most significant byte is stored at the end. However, CPU reads from the least significant byte to the most significant byte but construct the binary number in the reverse direction.

00000000      1000001    => 100000100000000 => 16640
--------  ->  -------
   LSB          MSB

There is a whole debate on why little-endian is better than big-endian but modern processors has settled on little-endian. In nutshell, little-endian offers reading the least significant byte first, and processors can start performing arithmeric operations before while next byte is fetching. Read more.

The system I am working on is little-endian, hence it gives me an unexpected result when we are trying to view buffer in unsigned 16-bit integer values. This is because in a little-endian system, memory [00000000 1000001] would be read like 1000001 <- 00000000 and in the decimal number system, 100000100000000 is 16640.

DataView

Similar to Typed Arrays, DataView gives us a low-level interface to work on ArrayBuffer. But unlike TypedArrays, we do not have to worry about the endianness of a system.

By instantiating DataView with a buffer object, we get a few methods to manipulate underlying data. Similar to Typed Array construction, we can optionally pass the offset and length values (documentation).

var bufferDataView = new DataView(buffer);

⪢ DataView(4) { byteLength: 4, byteOffset: 0 }

While working on a DataView object, we have a few getter and setter methods. Getter methods help us convert single or multiple blocks of bytes to a valid typed number while setter methods update the underlying buffer.

bufferDataView.getUint8(1); // 65
bufferDataView.getUint16(0); // 65
bufferDataView.getUint16(1); // 16640

The getUint8(index) method reads a byte at the given index in the buffer and converts it to an unsigned integer. In the above example, we are reading a byte at index 1 which is 1000001, hence we get the value of 65.

The getUint16(index) method reads 2-bytes starting from the given index in the buffer. While converting these bytes to an unsigned integer, it doesn’t care for the system’s endianness. It reads the byte from left to right.

In the 2nd statement, getUint16 start reading from the 1st byte. Hence, from the 00000000 1000001 binary number, we get the value of 65.

The third statement in the above example is justified. Because we are reading 2-bytes starting from the 2nd byte, hence from the 1000001 00000000 binary number, we get the value of 16640.

If we want to update buffer with some values, we can use setter methods of DataView. Let’s say if we want the entire 4-bytes buffer to store an unsigned integer, we should use setUint32 method.

bufferDataView.setUint32( 0, 65 );

In the above example, we are storing an integer value of 65 in the buffer. Since we are using setUint32 method, it will consider 4-bytes to store the data starting from the 1st byte.

In binary representation, 65 as a 32-bit unsigned integer looks like [00000000 00000000 00000000 10000001]. And if we inspect the buffer as Uint8Array, we get the expected result.

bufferUint8
⪢ Uint8Array(4) [0, 0, 0, 65]

DataView offers many getter (read) and setter (write) methods to manipulate underlying buffer however we want. Read MDN documentation.

Storing some meaningful data in memory

So far we have acquired the knowledge of Text Encoding and ArrayBuffer data structure. Let’s use them and store some plain text data in memory. We will also some image data in memory and paint something on the screen.

Working with Plain Text data

First, we need some text. Let’s use classical Hello World! example. To store anything in memory, we need a binary representation of the data. To represent plain text data in binary, we can use UTF-8 encoding.

Since UTF-8 and ASCII encodings are compatible, we can use ASCCI Table to find code-points of each character of Hello World! string. But JavaScript provides much better API to do the same.

In JavaScript, a string literal is an instance of String class and it provides a charCodeAt prototype method to find a code point of character at a given index in the string. In nutshell, charCodeAt returns an integer.

Most JavaScript engine encodes a string in UTF-16 encoding. Hence, each character is saved in 1 or 2 16-bit blocks. The maximum size of the 16-bit binary number is 65535. If a character represented in just one code-unit, charCodeAt returns an integer less than 65536.

But if a character takes two code-units, the value of the first code-unit is returned. Hence charCodeAt always returns value less than 65536. In this case, endianess of the system matters, read Wikipedia article on how bytes are ordered).

There is a much better codePointAt prototype method that always returns the real code-point of a character but is not supported in IE.

However, since we are playing with simple ASCII characters, we don’t have to worry about charCodeAt returning the wrong number. Also, we don’t have to get bothered by UTF-16 encoding, as in UTF, all code points are the same.

So let’s loop around Hello World! and get an array of UTF code points.

var chars = "Hello World!".split( '' );
var codePoints = chars.map( c => c.charCodeAt(0) );

⪢ (12) [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]

In the above example, we are basically splitting the string into different words and returning an array of code points using .map prototype method.

JavaScript provides the fromCharCode static method on the String class which takes a sequence of UTF-16 code units and returns a character.

var chars = codePoints.map( cp => String.fromCharCode( cp ) );
var str = chars.join( '' );

⪢ "Hello World!"

💡 However, in the above example, we just converted code points back to the original string from an array of integers (code points). The idea is to represent plain text data in the memory and decode text data from it.

Since UTF-8 is easy to understand and encode, we will store characters as per UTF-8 encoding. Since we are dealing with ASCII characters, each character requires only one code unit or 8 bits of memory.

Let’s create an ArrayBuffer of 12 bytes, 1-byte for each character, since we are dealing with 12 ASCII characters.

var buffer = new ArrayBuffer(12);
var bufferUint8Array = new Uint8Array(buffer);

⪢ Uint8Array(12) [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

Now that we have a Uint8Array representation of buffer, we can put character code-points in the buffer.

codePoints.forEach( ( cp, index ) => {
  bufferUint8Array[ index ] = cp;
} );

console.log( bufferUint8Array );

⪢ Uint8Array(12) [72, 101, 108, 108, 111, 32, 87, 111, 114, 108, 100, 33]

In the above example, we have looped around codePoints array and stored integer code-point value of each character as per their index in the string.

JavaScript does not provide an interface to work with UTF-8 encoded data. However, the modern browser provides TextDecoder interface (Web API) to decode plain text binary data from a buffer. However, it is an experimental Web API and you should avoid it using in the production.

var utf8decoder = new TextDecoder( 'utf-8' );
utf8decoder.decode(buffer);

⪢ Hello World!

In the above example, we have created a UTF-8 text decoder that returns a decoder object. Using decode prototype method, we can convert a UTF-8 text buffer to a string. Here, decode method is returning a String object.

💡 If you are working in a Node.js environment, you should consider string_decoder built-in package.

Working with Bitmap images

To understand Bitmap encoding, please have a look at my article on Bitmap images. In this article, you will understand what different sections of binary data signify. I am going to use the first example mentioned in this article.

We are going to create a 5px x 5px Bitmap image of 16-bit color depth. In hex representation, this file will look like below.

Source: gist

42 4D 00 00 00 00 00 00 00 00 36 00 00 00 28 00 00 00 05 00 00 00 05 00 00 00 01 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 FF 7F 00 00 00 00 00 00 E0 7F 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 E0 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7C 00 00 00 00 00 00 1F 00 00 00

This file contains 114 bytes, hence we have to create an ArrayBuffer of size 114. We also have to create an array of individual byte strings.

var bitmapBuffer = new ArrayBuffer( 114 );
var bitmapUint8Array = new Uint8Array( bitmapBuffer );
var byteStrings = '42 4D 00 00 ... '.split( ' ' );

console.log( byteStrings );

⪢ (114) ["42", "4D", "00", "00", ... ]

Let’s initialize the values of each byte of bitmapBuffer from the values of byteStrings array. However, an item of byteStrings in a string which represents a hexadecimal number. We can parse a string to an integer using parseInt function but we also need to pass a base which is 16 for hex.

byteStrings.forEach( ( byteString, index ) => {
  bitmapUint8Array[ index ] = parseInt( byteString, 16 );
} );

console.log( bitmapUint8Array );

⪢ (114) [66, 77, 0, 0, ... ]

Now that we have Bitmap image data in a buffer, we can do all sorts of things with it. If you are working in a Node.js environment, you can directly write a file with this binary data. However, in the browser, we need to use Blob.

Blob in Browsers

A Blob object is a file-like object of immutable raw-binary data but in memory. The Blob constructor variety of different data structures in an array but we are interested in ArrayBuffer as input data.

var bmpFile = new Blob( [ bitmapBuffer ], { type: 'image/bmp' } );

console.log( bmpFile );

⪢ Blob {size: 114, type: “image/bmp”}

As you can see from the above example, we have created a Blob object by passing an array which contains bitmapBuffer which is an ArrayBuffer. We have also supplied a MIME Type so that other Web APIs can understand this file correctly as a Bitmap image.

A Blob object isn’t useful on its own and we need to use other Web APIs to make use of it. One great use of Blob object is to create an absolute URL that points to a file-like objects stored in memory. Such URLs can be generated using the URL interface (Web API).

The createObjectURL static method of URL class takes a Blob object and returns a URL. This Blob object will be saved memory once an objectURL is created from it. To release the Blob object (if not referenced anywhere else), we need to call URL.revokeObjectURL method which takes the objectURL.

var bmpFileURL = URL.createObjectURL( bmpFile );

console.log( bmpFileURL );

⪢ "blob:https://medium.com/bb3ffdb6-3ab5-4e3e-80fa-0b9bf9b11585"

You can open this URL in the same tab or in a new tab.

I am not 100% sure about this, but opening an objectURL in new tab works because an objectURL is associated with the hostname or domain name. Google Chrome uses a single process (hence the same JavaScript thread and heap) for browser tabs that share the same host.

blob:https://medium.com/bb3ffdb6-3ab5-4e3e-80fa-0b9bf9b11585

You can also use this URL in <img> element’s src attribute, <a> element’s href attribute or in CSS to add a background image using url().

Playing with In-Memory Data Structure of JavaScript: Introduction to UTF, Typed Arrays and Blob was originally published in ITNEXT on Medium, where people are continuing the conversation by highlighting and responding to this story.

Playing with In-Memory Data Structure of JavaScript: Introduction to UTF, Typed Arrays and Blob

Memory Representation

Signed and UnSigned integers

Binary file formats

MIME (Multipurpose Internet Mail Extensions) Types

Character Encoding

ASCII Encoding

UTF-8 Encoding

UTF-16 and UTF-32 Encoding

Charset vs Character Encoding

Looking inside a Plain Text file

Typed Arrays in JavaScript

ArrayBuffer

Endianness

DataView

Storing some meaningful data in memory

Working with Plain Text data

Working with Bitmap images

Blob in Browsers

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List