Embrace Complexity (Part 2 Data)

Tony Seale
Experience Stack
Published in
10 min readFeb 18, 2022

--

How to network-ify your organisation in three simple(ish) steps … starting with data

by Tony Seale

Your information does not want to be trapped in a box

In Part One of this article, we established that the industrial era organised complexity into nice, neat linear boxes and that whilst this was a highly effective way of building machines this approach is no longer adequate to meet the needs of a modern organisation operating within a global network in the Information Age.

It doesn’t matter whether we like it or not, the process of ‘network-ification’ is underway and there is nothing any organisation can do to stop it; the only realistic option is to get on board and mirror the network structure internally.

As explained in Part One, the three main driving forces of the Information Age (Data, Cloud and AI) can be unified together into a single process of ‘network-ication’ so the remainder of this article is broken down into three corresponding sections:

  • DATA: how organisations can move their data out of separate box-shaped tables and into network-shaped fragments that fit together like pieces of a jigsaw, and how the people in those organisations can include their knowledge and conceptual understanding as part of those network-shaped fragments.
  • CLOUD: how organisations can keep their data where it is and integrate it ‘at source’ as opposed to moving it to a central location for industrialised cleansing and integration by a small central team, and how this decentralized architecture can democratise the data integration process within an organisation.
  • AI: how organisations can stop playing catchup and instead get on the front foot by committing to a new breed of cutting-edge AI algorithms designed to work natively with network-shaped data and the pathways that criss-cross them.

To keep things real, each section will end with a top-level introduction of a pragmatic tool. The idea is that joined together these three tools will form a unified toolset that is sophisticated enough to deal with the true complexity of your organisation.

(To make this article more digestible, this post will just cover Data whilst subsequent instalments will cover Cloud and AI)

Data

Right, so let’s start with data. As already stated in Part One, computer networks and neural networks are already naturally network-shaped, but most organisational data is still box-shaped. Data is the odd one out and is, therefore, both the weak link and the key to unlocking the full power of the other two.

The problem is not as bad for internet-native companies as their business models have from the start been based on pulling most of their data from the web and, of course, the web is one huge network. For example, Google has a network of links between web pages and Facebook has data on the links between the friends in your social network. At a largely unconscious level, this advantageously forced the tech giants to think about their data in a less linear way.

But for most organisations our linear thinking is reflected in the box-shaped data structures we have created, and let’s be clear, this is the single biggest factor holding most organisations back from the technological phase transition.

When you examine your organisation’s data, you are likely to see that it is currently scattered in a set of isolated tables including Excel spreadsheets and various databases. So if, for example, we are capturing information about people and the products they have ordered then we need to create three separate tables: one for people, another for orders and another for products. An Excel spreadsheet is a simple and familiar example of a table of data.

Now, Excel tables are simple and flexible but you can’t run an organisation on Excel (although many keep giving it a go). Why can’t you run an organisation on Excel? Well, individually each of these tables is of limited use and needs to be connected to the other tables to be more useful.

The ‘industrialised’ answer to connecting tables together is the Relational Database. All relational databases include unique ID’s in each row, which can be used to stand in for and represent all the information in that row just like the barcode on a shopping item can be used to stand in for all the information about that product. This unique key is then used to connect the tables.

Notice how things got more complicated once we introduced connectivity, this just goes to reinforce the point that complexity increases with connectivity.

To help clarify the situation, let’s look at the example of capturing data about people and the products they have ordered in a set of example relational tables each with their unique ids:

Don’t worry if you find this example hard to follow, the fact that it is a little hard to follow only serves to prove the point that linearised thinking makes an already complex situation appear extremely complicated.

Now imagine thousands of rows in each table, hundreds of tables in each database and often thousands of databases in each organisation and you will get an intuition of the very real problem we are trying to address here.

Within IT departments, box-shaped thinking like this is so ingrained that we don’t even notice as we parcel up our data into these separate tables. Even the meta-information about what the various columns and tables mean has to be parcelled into their own separate, and most often proprietary, ‘schema’ tables and are kept separate from the rest of the data.

On the face of it, tables are simple but make no mistake, there is a very real hidden integration cost to holding our data this way. Each time that we create a new table or even worse, a whole new database, we have just added to the organisation’s overall fragmentation. The process of linking all the data back together into one system has become a fraction more complicated. Over time these fractions add up. Like Marley’s ghost from A Christmas Carol, each table is like another heavy chain draped around the organisation. As a result, it is not unheard of that adding a single new column can cost a large organisation millions of dollars.

There are two fundamental problems at work here:

  • Tables start from the position of the isolated parts and only tack on the connections between the parts as an afterthought. Using IDs to link between tables is clunky and awkward and it breaks down entirely when you want to connect to information that is held in another database (as each database usually creates its own isolated IDs).
  • The information that allows us to talk about people or products in the abstract are not represented as normal data so I can’t find easily out more general and conceptual information such as “what does that stock column in the product table actually mean?” or “how are orders related to products?”. All I have is raw, flat data.

Enough of the doom and gloom. Here is how the network-shape can be used as a ‘north star’ to find a way out. Relational databases are based on a branch of mathematics called Set Theory where you treat a collection as no more than the sum of the parts, but networks are based on Graph Theory which takes the connections between the parts into account. From here on in I will use the terms ‘graph’ and ‘network’ interchangeably as they refer to the same thing.

The point to note is that treating the connections between the parts as first-class citizens is the way to begin to harness complexity because complexity is related to connectivity.

Let’s take the information held in rows and columns of our separate tables where we say “row 1111 in the Person table is linked to the row 2222 in the table Orders” and see if we can model the relationship explicitly by just saying something more like “Ben ordered Bridesmaids”. In information terms, this is like moving from two-part encoding (row and column) to three-part encoding (item, relationship, object).

OK, let’s see that in action, we will run all our example tables through the grinder, chopping all the information up into simple three-part statements:

This table still looks quite incomprehensible but by using a three-part encoding we have worked a little magic; we have moved from multiple sets to a single graph that incorporates the relationships as first-class citizens and thus we have shifted from a set of separate 2D tables to one 3D network. It is therefore best to now visualise this same information as network directly:

In many ways, this is a more natural way to think about your data as it is much closer to the way your brain works. Perhaps for the first time, you can begin to clearly see how our example of the ordering process is connected into one whole system. If so, then welcome to thinking in curves.

What is more, having the abstract concepts included in the network means we can zoom out from the messy details of the individual data items and look at how everything fits together at a higher conceptual level.

Finally, by viewing the raw data through the knowledge that has been embedded in this conceptual model we can gain potentially useful insights into how the system functions and flows.

For example, imagine that we grouped the people up into households, counting the number of orders a household made grouped by the category of movie that was ordered and that when we did this over time, we noticed a balancing feedback loop (remember them?) between the number of Action Movies ordered and the number of Romcom Movies ordered.

As we analyse the loop perhaps we discover that romcoms and actions movies oscillate in the same way that predator and prey numbers do in nature.

If you can work out the story behind this loop, then welcome to thinking in circles or what is more formally known as Systems Thinking. To say that all this could have been done with tables misses the point. Graphs were made for this job, they allow us to see the forest and not just the trees. The network shows us how the data flows through the connections between the parts and it is this connectivity that turns the parts into a whole. In other words, networks allow us to see our organisation as a system which in turn enables us to think and plan systemically.

By moving to three-part encoding, we have created a much more expressive and complex structure that is, at the same time, somehow simpler to understand.

To enable systems thinking we have taken the same tabular data but we have built connectivity in from the start and that has changed several important things so let’s take a moment to break them all down:

  • The separate tables and databases have now merged into one structure (what we could call one complex system)
  • The connections between the parts are now explicit so we can seamlessly trace a path through the system from any part to another and see how they are connected
  • The model (i.e. the column and table names) is now included explicitly as part of the data and this conceptual model can be crafted in such a way as to capture abstract knowledge alongside the data
  • We have the data in a structure that can natively model feedback loops

In the industrial era, to enable widespread use and efficiency, we standardised electrical supply. For similar reasons, in the information age, this applies to data; each data source must supply a standard network-shaped ‘data socket’ and any application that wants to use and query that data can now use a standard network-shaped ‘data plug’.

To create a data-plug we must turn two-part data into three-part data that explicitly models the relationships. This is not rocket science and any developer worth his or her salt will be able to loop over the tables in your database and convert them into three-part statements that combine to form a network.

With this simple step, our isolated boxes turn into connected fragments of a larger network. The next section on The Cloud will illustrate how these fragments can be linked to the fragments produced by other teams, but for now what is important is what we have in front of us: a simple way to generate networks. You can load these networks into graph databases, graph visualization tools and even Graph Machine Learning algorithms. There is a lot of fun to be had here for those wishing to think in loops and curves. In short, we have the first of our practical tools: The Graph Adapter.

Tool Number One: The Graph Adapter

A graph adapter sits on top of each and every important source of information in the organisation. The adapter converts two-part statements into three-part statements and exposes a graph fragment (a network-shaped chunk of data that will seemless connected with other network-shaped chunks). The underlying database, file or API does not need to change — the adapter just exposes a network-shaped layer on top of it.

Part 3 Cloud

--

--