About Data Flow¶
What is Data Flow¶
While it might not look like it Data Flow is a mainstream technology as seen in,
Unix Pipes
DSP Programming // Max, Pure Data, NI
Visual Scripting // Unity, Unreal
Graphics Pipelines // Blender
Data Analytics // Pig, Apache Nifi
IOT // NoFlo, Node-Red
DAG Workflow // Luigi, Airflow
It is common in engineering disciplines too
LabView
Simulink
Ptolemy
PLC
Most of these approaches use push based approaches to implementing Data Flow. A complete Data Flow engine in some sense provides unix pipes + os + shell + top + ps + linker … Pull based Data Flow (FBP) is not popular yet, I think it is a matter of education, simplicity in implementation and apps. This aim of this book to help in the education part of both push and pull based approaches, among others. The subsequent chapters will explore the philosophy of Data Flow, walkthrough through implementations and plenty of practical examples.
Competing Technologies with Data Flow
Message Queues / Event Driven Programming / Actor Model
State Machines / HSM / Melay State Machines / REST
Linda / Entity Systems
Workflows
Currently Data Flow is mostly used for batch systems, but that need not be the case. With some tweaks Data Flow can effectively replace REST / microservices and even the interaction processing code for the front end. Flux and FRP show how signal flow based approaches are useful in reasoning complex interactions on the front end.
Unix pipes are the “worse is better” approach to flow based programming. Data Flow provides useful abstractions for modeling backend. It can provide a visual layer for backend much like what WebFlow does for CSS. You don’t have to use visual representations to use Data Flow though as both the code and visual representations map to each other 1-1. While you can write your backend, ui and frontend with just code adding a visual layer helps. One reason to favor text over visual is typing speed vs drawing speed.
FBP using Data Flow terminology is a pull based system as opposed to the push based systems that message queues and Unix pipes are, because of the suspend / resume semantics of co-routines. Push based systems are popular because they are easier to implement and worse is better.
While software systems have used messaging architectures and model driven development, Data Flow is broader than that and can cover all general programming cases. In fact the main goal of writing this book is allowing you to write your own Data Flow engine so that you don’t have to rely on complex tools that weigh a ton. Theoretically Data Flow is rooted in systems theory and there is long tradition in using Data Flow techniques in processors - Harward Architecture, Manchester, Instruction Scheduling in X86 - and database engines.
CS Theory relevant to Data Flow
LPN
Petri Net
FBP is completely asynchronous which makes it different from KPNs and Petri Nets, which have synchronization points.
With Data Flow running thousands of tests and simulations becomes trivial and it will be demonstrated in the subsequent sections. In many ways Data Flow / FBP resembles breadboard assembly like electronics. Electronics have a true component oriented architectures and better
Testability
Longevity
Reliability
Maintainability
Quality
Basics of Data Flow¶
Data Flow is represented by a Graph, which is made of Nodes and Connections. Nodes are also called Actors / Systems / Components / Processes.
Connections are like pipes. Also called Edges or Wires. They are connected to points called portlets / ports. Inlets, outlets signify the direction.
The Data that moves is called Data / Token / Packets / Entities / Records / Messages.
The directionality of movement implied by the terms Push / Pull or Data Driven / Demand Driven.
The execution mechanism of each node
Reactive - Node is fired when the data arrives, asynchronously
Firing Rules - Node is fired when tokens match some firing rules
Classical - Node is fired by its own accord, when it is ready to pull data
See this presentation and this.
Some more useful terms
Coarse grained vs fine grained
Homogeneous Data vs Heterogeneous Data
Stream / Substream
Initial packet
Subgraph / Patch
Main Component
What is Data Flow useful for ?¶
Data Analysis
IOT
Games
Simulation
Systems Modeling
Visual Applications
Reactive Applications like Spreadsheet
Modular Applications