Understanding Streams in Depth
Streams are a very interesting topic in computer science and we’ll go through this and will see how streams can help us handle our data better and in more reliable way.
What are streams?
Have you ever wished you could analyze and act on data as it’s being generated, rather than waiting for all of the data to be collected before processing it? Stream processing systems make this possible, allowing you to transform and analyse data in real-time as it flows through your system.
Imagine you have a heavy file , let’s say 5 GB in space and you want to read that file and do some modifications accordingly. You don’t want to wait long time until you can actually do the processing. What will you do, you make use of streams.
Streams lets you read content of the file as soon as possible as it helps you read chunks of file at a time and make it ready for you to access it.
All these Video delivery platforms uses Streams in order to publish content to millions of user, making it very easy for users to view that content without even care about the data size(as small chunks are loaded at a time), data usage (as small chunk of data is being delivery) and network bandwidth.
YouTube uses streams in a variety of ways to provide a seamless and efficient video viewing experience for its users.
This make it very easy to view any video, rather than just waiting for that large video to get loaded on your browser first.
In this blog, we’ll explore how streams work, and why they are so useful for handling large volumes of data with example.
There are a few key characteristics of streams:
- Data elements are produced and made available one at a time, rather than all at once.
- The data elements in a stream are typically processed and consumed as they are produced, rather than being stored for later processing.
- Streams can be unbounded, meaning they have no fixed end, or they can be finite, with a fixed number of data elements.
There are many types of streams in computer systems, including file streams, network streams, and memory streams. File streams allow you to read and write data to and from files on a storage device. Network streams allow you to read and write data over a network connection. Memory streams allow you to read and write data to and from a memory buffer.
Streams can be used in a variety of contexts, including file I/O, network communication, and data processing. They are particularly useful when working with large amounts of data, as they allow data to be processed in a “streaming” manner, rather than having to be loaded into memory all at once.
There are several different types of streams that can be used in computer systems, including:
- Byte streams: Byte streams are streams that transfer data in the form of bytes, which are the basic unit of data in a computer system. Byte streams are the most general-purpose type of stream, and can be used to transfer data of any type.
const fs = require('fs');
// Create a readable stream to read data from a file
const reader = fs.createReadStream('input.txt');
// Create a writable stream to write data to a file
const writer = fs.createWriteStream('output.txt');
// Pipe the data from the input file to the output file
reader.pipe(writer);
- Character streams: Character streams are streams that transfer data in the form of characters, which are used to represent text. Character streams are often used for working with text data, and can automatically handle tasks such as character encoding and decoding.
const fs = require('fs');
// Create a readable stream to read data from a file
const reader = fs.createReadStream('input.txt', { encoding: 'utf8' });
// Create a writable stream to write data to a file
const writer = fs.createWriteStream('output.txt', { encoding: 'utf8' });
// Pipe the data from the input file to the output file
reader.pipe(writer);
- Buffered streams: Buffered streams are streams that use a buffer to store data temporarily, in order to improve the efficiency of data transfer. Buffered streams can reduce the number of read and write operations that are performed, and can improve the performance of a system.
const fs = require('fs');
const { Transform } = require('stream');
// Create a transform stream that uses a buffer to improve performance
const transformer = new Transform({
transform(chunk, encoding, callback) {
// Process the data in the chunk
// ...
callback(null, chunk);
}
});
// Create a readable stream to read data from a file
const reader = fs.createReadStream('input.txt');
// Create a writable stream to write data to a file
const writer = fs.createWriteStream('output.txt');
// Pipe the data through the transform stream
reader.pipe(transformer).pipe(writer);
- Filtered streams: Filtered streams are streams that apply a transformation to the data that is being transferred. Filtered streams can be used to perform tasks such as compression, encryption, and decryption.
const fs = require('fs');
const zlib = require('zlib');
// Create a readable stream to read data from a file
const reader = fs.createReadStream('input.txt');
// Create a writable stream to write data to a file
const writer = fs.createWriteStream('output.txt');
// Create a transform stream to compress the data as it is being transferred
const compressor = zlib.createGzip();
// Pipe the data through the transform stream
reader.pipe(compressor).pipe(writer);
- Object streams: Object streams are streams that transfer data in the form of objects, which are complex data structures that can contain a variety of data types. Object streams can be used to transfer objects between different parts of a system, or between systems.
const fs = require('fs');
const { PassThrough } = require('stream');
// Create an object to be transferred
const data = {
name: 'John Smith',
age: 30
};
// Create a pass-through stream to transfer the object
const stream = new PassThrough();
// Write the object to the stream
stream.end(JSON.stringify(data));
// Create a writable stream to write the data to a file
const writer = fs.createWriteStream('output.txt');
// Pipe the data from the pass-through stream to the output file
stream.pipe(writer);
- Parallel processing: It allows you to create a pipeline of streams that can be used to process data in parallel
const fs = require("fs");
const { pipeline } = require("stream");
const { Transform } = require("stream");
// Create a transform stream that processes data in parallel
const transformer = new Transform({
parallel: 4, // Set the number of parallel workers to 4
transform(chunk, encoding, callback) {
// Process the data chunk and pass it on to the next stream
this.push(chunk.toString().toUpperCase());
callback();
},
});
// Create a writable stream to write the transformed data to a file
const writer = fs.createWriteStream("sample_files/output.txt", { flags: "a" });
// Pipe the data from the readable stream through the transform stream and into the write stream
pipeline(
fs.createReadStream("sample_files/input.txt"),
transformer,
writer,
(err) => {
if (err) {
console.error("Pipeline failed.", err);
} else {
console.log("Pipeline succeeded.");
}
}
);
Some examples of how YouTube uses streams include:
- Streaming video content to users: When a user loads a video on YouTube, the video is streamed to their device in real-time as they watch it. This allows the user to begin watching the video almost immediately, without having to wait for the entire video to download.
- Streaming live video content: YouTube also uses streams to deliver live video content to users. When a user subscribes to a channel that broadcasts live content, they can watch the live stream in real-time as it is being broadcast.
- Streaming audio content: YouTube also uses streams to deliver audio content to users. For example, users can listen to music tracks or audio recordings on YouTube by streaming them in real-time.
- Transcoding video content: YouTube uses streams to transcode and process video content uploaded by users. When a user uploads a video to YouTube, the video is streamed through a series of processing steps that convert it into a format that can be streamed to users.
Overall, YouTube uses streams to efficiently deliver a wide range of content to users in real-time, allowing users to watch and listen to content as it becomes available.
Conclusion: Streams can be used to efficiently process, transfer, and consume large amounts of data in real-time, making them a useful tool in a wide range of applications.
I hope this gives you a sense of how streams can be used. Please let me know if you have any questions or if you would like further assistance.
If you like this blog and would like to read more of these, checkout my other blogs:
- Deep dive into System design
- Most commonly used algorithms.
- Consistent Hashing
- Bit , Bytes And Memory Management
- CAP Theorem Simplified
Other Life changing blogs for productivity and Focus: