How to Create a Network Proxy Using Stream Processor Pipy

2022-08-20 11:03:39 By : Ms. yu Qin

InfoQ Live Aug 23: How can you future-proof your deployment to keep pace with innovation? Register Now

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

The panelists discuss ways to improve as developers. Are better tools the solution, or can simple changes in mindset help? And what practices are already here, but not yet universally adopted?

Legacy applications actually benefit the most from concepts like a Minimum Viable Product (MVP) and its related Minimum Viable Architecture (MVA). Once you realize that every release is an experiment in value in which the release either improves the value that customers experience or doesn’t, you realize that every release, even one of a legacy application, can be thought of in terms of an MVP.

In this annual report, the InfoQ editors discuss the current state of AI, ML, and data engineering and what emerging trends you as a software engineer, architect, or data scientist should watch. We curate our discussions into a technology adoption curve with supporting commentary to help you understand how things are evolving.

In this podcast Shane Hastie, Lead Editor for Culture & Methods spoke to Arpit Mohan about the importance and value of interpersonal skills in teamwork

Erin Schnabel discusses how application metrics align with other observability and monitoring methods, from profiling to tracing, and the limits of aggregation.

Learn how cloud architectures help organizations take care of application and cloud security, observability, availability and elasticity. Register Now.

Understand the emerging software trends you should pay attention to. Attend in-person on Oct 24-28, 2022.

Make the right decisions by uncovering how senior software developers at early adopter companies are adopting emerging trends. Register Now.

InfoQ Homepage Articles How to Create a Network Proxy Using Stream Processor Pipy

In this article we are going to introduce Pipy, an open-source cloud-native network stream processor, explain its modular design, architecture and will see a quick example of how we can have a high-performance network proxy up and running quickly to serve our specific needs. Pipy has been battle-tested and is already in use by multiple commercial clients.

Pipy is an open-source, lightweight, high-performance, modular, programmable, cloud-native network stream processor that is ideal for a variety of use-cases ranging from (but not limited to) edge routers, load balancers & proxy solutions, API gateways, static HTTP servers, service mesh sidecars, and other applications.

As it turns out, each of those attributes to Pipy has a pretty specific meaning of its own, so let’s take a look.

The platform to power synchronized digital experiences in realtime. Guaranteed to deliver at scale. Get started for free.

The compiled Pipy executable is approximately 6MB in size and requires a very small memory footprint to run.

Pipy is written in C++ and built on top of the Asio asynchronous I/O library.

Pipy has a modular design at its core, with many small reusable modules (filters) that can be linked together to form a pipeline through which network data flows and is processed.

Pipy operates on network streams using an event-driven pipeline where it consumes the input stream, performs user-provided transformations, and outputs the stream. Pipy streams abstracts out data bytes into events belonging to one of four categories:

Network streams are composed of data bytes and come in chunks. Pipy abstracts out chunks into a Data event.

These three non-data events work as markers, giving the raw byte streams high-level semantics for business logic to rely on.

Pipy comes with built-in JavaScript support via its custom-developed component PipyJS, which is part of the Pipy code base, but has no dependency on it.  PipyJS is highly customizable and predictable in performance, with no garbage collection overhead. In the future, PipyJS might move to its standalone package.

The internal workings of Pipy are similar to  Unix Pipelines but unlike Unix pipelines, which deal with discrete bytes, Pipy deals with streams of events.

Pipy processes incoming streams via a chain of filters, where each filter deals with general concerns like request logging, authentication, SSL offloading, request forwarding, etc. Each filter reads from its input and writes to its output, with the output of one filter connected to the input of the next.

A chain of filters is called a pipeline and Pipy categorizes pipelines in 3 different categories according to their input sources.

Reads in Data events from a network port, processes them, and then writes the result back to the same port. This is the most commonly used request and response model.

For instance, when Pipy works like an HTTP server, the input to a Port pipeline is an HTTP request from the clients, and the output from the pipeline would be an HTTP response sent back to clients.

Gets a pair of MessageStart and MessageEnd events as its input periodically. Useful when cron job-like functionality is required.

This works in conjunction with a join filter, such as link, which takes in events from its predecessor pipeline, feeds them into a sub-pipeline for processing, reads back the output from the sub-pipeline, and then pumps it down to the next filter.

The best way to look at sub-pipelines and join filters is to think of them as callees and callers of a subroutine in procedural programming. The input to the joint filter is the subroutine’s parameters, the output from the joint filter is its return value.

Note: root pipelines like Port & Timer cannot be called from join filters.

Another important notion in Pipy is that of contexts. A context is a set of variables attached to a pipeline. Every pipeline gets access to the same set of variables across a Pipy instance. In other words, contexts have the same shape. When you start a Pipy instance, the first thing you do is to define the shape of the context by defining variable(s) and their initial values.

Every root pipeline clones the initial context you define at the start. When a sub-pipeline starts, it either shares or clones its parent’s context, depending on which joint filter you use. For instance, a link filter shares its parent’s context while a demux filter clones it.

To the scripts embedded in a pipeline, these context variables are their global variables, which means that these variables are always accessible to scripts from anywhere as long as they live in the same script file.

This might seem odd to a seasoned programmer because global variables usually mean they are globally unique. You have only one set of these variables, whereas in Pipy we can have many sets of them (aka contexts) depending on how many root pipelines are open for incoming network connections and how many sub-pipelines clone their parents’ contexts. 

PipyJS is a small and embeddable JavaScript engine that is designed for high performance with no garbage collection overhead. It supports a subset of ECMAScript standards and in some areas deviates from that. Currently it supports JavaScript expressions, functions, and implements JavaScript standard APIs like String, Array etc. 

Contexts as described above are a very crucial feature of PipyJS and Pipy extends them in a way specific to the particular needs of proxy servers, which require multiple sets of global variables for each connection. Each context state is not visible to other contexts, thus making it unique and only accessible to the state which defined it.

If you are familiar with multi-threaded programming concepts, you can also think of contexts as TLS (thread-local storage), where global variables have different values across different threads.

Pipy is designed for high compatibility across different operating systems and CPU architectures. Pipy has been fully tested on these platforms:

CentOS7/REHL7 or FreeBSD are recommended in production environments.

For the impatient, we can run the production version of pipy via docker with one of the tutorial scripts provided on the official pipy GitHub repository. Let’s follow the norm of classic Hello World!, but let’s change the wordings to Hi there!

The Pipy Docker image can be configured with a few environment variables:

This will start the Pipy server with the provided script. Keen users might have noticed that instead of a local file we have provided the link to a remote Pipy script via the environment variable PIPY_CONFIG_FILE and Pipy is smart enough to handle such cases.

For your reference, this is the contents of file tutorial/01-hello/hello.js:

In this script, we have defined one Port pipeline which listens on port 8080 and returns “Hi, there!” for each HTTP request received on the listening port.

As we have exposed local port 8080 via the above docker run command, we can proceed with a test on the same port:

Executing the above command should display Hi, there! into the console.

For learning, development, or debugging purposes it’s recommended to proceed with the local installation (either build Pipy from sources or download a pre-built release for your specific OS) of Pipy as it comes with an admin web console along with documentation and tutorials.

Once installed locally, running pipy without any arguments starts the admin console on port 6060,  but it can be configured to listen on the different port via --admin-port= argument.

To build Pipy from its source or to install a pre-compiled binary for your operating system, please refer to README.md on Pipy github repository.

To start a Pipy proxy, run Pipy with a PipyJS script file, for example, the script in tutorial/01-hello/hello.js if you need a simple echo server that responds with the same message body received with every incoming request:

Alternatively, while developing and debugging, one can start Pipy with a builtin web UI:

That was a quick conceptual and technical introduction to Pipy, and was required for us to go through writing a network proxy with caching and load-balancing support, which we will see in the next section.

Suppose we are running separate instances of different services and we would like to add a proxy to forward the traffic to the relevant services based on the request URL path. This would give us the benefit of exposing a single URL and scaling our services in the backend without users having to remember distinct services’ URLs. In normal situations, your services would be running on different nodes and each service could have multiple instances running.In this example, though, we are assuming we are running the services below and we want to distribute traffic to them based on the URI.

Pipy scripts are written in JavaScript and you can use any text editor of your choice to edit them. Alternatively, if you have installed Pipy locally, you can use Pipy admin Web UI, which comes with syntax highlighting, autocompletion, hints, as well as the possibility of  running scripts, all from the same console.

So let’s start a Pipy instance, without any arguments, so the Pipy admin console will start on port 6060. Now open your favorite web browser and navigate to http://localhost:6060. You will see the builtin Pipy Administration Web UI (Figure 1).

A good design practice is that code and configurations are separated. Pipy supports such modular design via its Plugins that you can think of as JavaScript modules. That said, we will be storing our configuration data under the config folder and our coding logic in separate files under the plugins folder. The main proxy server script will be stored in the root folder, the main proxy script (proxy.js) will include and combine the functionality defined in separate modules.  Once we are done with the steps detailed below, our final folder structure will look like:

So let’s start:

7. Create the file /plugins/router.js, which stores our routing logic  :

if you have followed the steps above, then you will have something similar to what you see in the screenshot below:

Now let’s run our script by hitting the play icon button (4th from right). If we didn’t make any mistake in our scripts, we will see Pipy run our proxy script and we will see an output like:

That shows our proxy server is listening on port 8000 (which we configured in our /config/proxy.json). Let’s use curl to run a test:

That makes sense, as we haven’t configured any target for root. Let’s try one of our configured route, e.g. /hi:

We get 502 Connection Refused as we have no service running on our configured target port.

You can update /config/balancer.json with details like host, port of your already running services to make it fit for your use case, or let’s just write a script in Pipy which will listen on our configured ports and return simple messages.

Save the snippet below to a file on your local computer  named mock-proxy.js, and remember the location where you stored it.

Open a new terminal window and run this script via Pipy (where /path/to is referring the location where you have stored this script file):

Now we have our mock services listening on ports 8080, 8081, and 8082. So let’s do a test again on our proxy server and you will see the correct response returned from our mock service.

We have used a number of Pipy features, including variable declaration, importing/exporting variables, plugins, Pipelines, sub-pipelines, filter chainings, Pipy filters like handleMessageStart, handleStreamStart, and link, and pipy classes like JSON, algo.URLRouter, algo.RoundRobinLoadBalancer, algo.Cache, etc. A thorough explanation of all these concepts is out of the scope of this article, but you are encouraged to read Pipy documentation,  which is accessible via Pipy’s admin web UI, and follow the step-by-step tutorials which come with it.

Pipy from Flomesh is an open-source, extremely fast, and lightweight network traffic processor which can be used in a variety of use cases ranging from edge routers, load balancing & proxying (forward/reverse), API gateways, Static HTTP Servers, Service mesh sidecars, and many other applications. Pipy is in active development and maintained by full-time committers and contributors, though still an early version, it has been battle-tested and in production use by several commercial clients. Its creator & maintainer Flomesh.cn offers commercial-grade solutions that are running on top of Pipy as its core.

This article provided  a very brief overview and a high-level introduction to Pipy. Step-by-step tutorials and documentation can be found on its GitHub page or accessed via Pipy admin console web UI. The community is welcome to contribute to Pipy development, give it a try for their particular use-case, or provide their feedback and insights.

Becoming an editor for InfoQ was one of the best decisions of my career. It has challenged me and helped me grow in so many ways. We'd love to have more people join our team.

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

You need to Register an InfoQ account or Login or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

Real-world technical talks. No product pitches. Practical ideas to inspire you and your team. QCon San Francisco - Oct 24-28, In-person. QCon San Francisco brings together the world's most innovative senior software engineers across multiple domains to share their real-world implementation of emerging trends and practices. Uncover emerging software trends and practices to solve your complex engineering challenges, without the product pitches.Save your spot now

InfoQ.com and all content copyright © 2006-2022 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with. Privacy Notice, Terms And Conditions, Cookie Policy