Microchips and fleet management

The ultimate duo for smart product at scale

February 7, 2022

Risto Saari Solution Architect

We have seen how cloud based manufacturing has taken a huge step forward and you can find insights listed in our blog post The Industrial Revolution 6.0. Cloud based manufacturing is already here and extends IoT to the production floor. You could define a connected factory as a manufacturing facility that uses digital technology to allow seamless sharing of information between people, machines, and sensors.

if you haven’t read it yet there is great article Globalisation and digitalisation converge to transform the industrial landscape.

There is still much more than factories. Looking around you will notice a lot of smart products such as smart TVs, elevators, traffic light control systems, fitness trackers, smart waste bins and electric bikes. In order to control and monitor the fleet of devices we need rock solid fleet management capabilities that we will cover in another blog post.

This movement towards digital technologies, autonomous systems and robotics will require the most advanced semiconductors to come up with even more high-performance, low power consumption, low-cost, microcontrollers in order to carry complicated actions and operations at Edge. Rise in the Internet of Things and growing demand for automation across end-user industries is fueling growth in the global microcontroller market.

As Software has eaten the world and every product is a data product there will only be SaaS Companies.

Devices at the field must be robust to connectivity issues, in some cases withdraw -30 ~ 70°C operating temperatures, build on resilience and be able to work in isolation most of the time. Data is secured on device, it stays there and only relevant information is ingested to other systems. Machine-to-machine is a crucial part of the solutions and it’s nothing new like explained in blog post M2M has been here for decades.

Microchip powered smart products

Very fine example of world class engineering is Oura Ring. On this scale it’s typical to have Dual-core arm-processor: ARM Cortex based ultra low power MCU with limited memory to store data up to 6 weeks. Even at this size it’s packed with sensors such as infrared PPG (Photoplethysmography) sensor, body temperature sensor, 3D accelerometer and gyroscope.

Smart watches are using e.g. Exynos W920, a wearable processor made with the 5nm node, will pack two Arm Cortex-A55 cores and an Arm Mali-G68 GPU. Even on this small size it includes 4G LTE modem and a GNSS L1 sensor to track speed, distance, and elevation when watch wearers are outdoors.

Taking a mobile phone from your pocket it can be powered by the Qualcomm Snapdragon 888 capable of producing 1.8 – 3 GHz 8 cores with 3 MB Cortex-X1.

Another example is Tesla famous of having Self-Driving Chip for autonomous driving chip designed by Tesla the FSD Chip incorporates 3 quad-core Cortex-A72 clusters for a total of 12 CPUs operating at 2.2 GHz, a Mali G71 MP12 GPU operating 1 GHz, 2 neural processing units operating at 2 GHz, and various other hardware accelerators. infotainment systems can be built on the seriously powerful AMD Ryzen APU powered by RDNA2 graphics so you play The Witcher 3 and Cyberpunk 2077 when waiting inside of your car.

Artificial Intelligence – where machines are smarter

Just a few years ago, to be able to execute machine learning models at Edge on a fleet of devices was a tricky job due to lack of processing power, hardware restrictions and just pure amount of software work to be done. Very often the imitation is the amount of flash and ram available to store more complex models on a particular device. Running AI algorithms locally on a hardware device using edge computing where the AI algorithms are based on the data that are created on the device without requiring any connection is a clear bonus. This allows you to process data with the device in less than a few milliseconds which gives you real-time information.

Figure 1. Illustrative comparison how many ‘cycles’ a microprocessor can do (MHz)

The pure power of computing power is always a factor of many things like the Apple M1 demonstrated how to make it much cheaper and still gain the same performance compared to other choices. So far, it’s the most powerful mobile CPU in existence so long as your software runs natively on its ARM-based architecture. Depending on the AI application and device category, there are various hardware options for performing AI edge processing like CPUs, GPUs, ASICs, FPGAs and SoC accelerators.

Price range for microcontroller board with flexible digital interfaces will start around 4$ with very limited ML cabalities . Nowadays mobile phones are actually very powerful to run heavy compute operations thanks to purpose designed super boosted microchips.

GPU-Accelerated Cloud Services

Amazon Elastic Cloud Compute (EC2) is a great example where P4d instances AWS is paving the way for another bold decade of accelerated computing powered with the latest NVIDIA A100 Tensor Core GPU. The p4d comes with dual socket Intel Cascade Lake 8275CL processors totaling 96 vCPUs at 3.0 GHz with 1.1 TB of RAM and 8 TB of NVMe local storage. P4d also comes with 8 x 40 GB NVIDIA Tesla A100 GPUs with NVSwitch and 400 Gbps Elastic Fabric Adapter (EFA) enabled networking. In practice this means you do not have to take coffee breaks so much and wait for nothing when executing Machine Learning (ML), High Performance Computing (HPC), and analytics. You can find more on P4d from AWS.

Top 3 benefits of using Edge for computing

There are clear benefits why you should be aware of Edge computing:

1. Reduced costs where costs for data communication and bandwidth costs will be reduced as fewer data will be transmitted.

2. Improved security when you are processing data locally, the problem can be avoided with streaming without uploading a lot of data to the cloud.

3. Highly responsive where devices are able to process data really fast compared to centralized IoT models.

Convergence of AI and Industrial IoT Solutions

According to a Gartner report, “By 2027, machine learning in the form of deep learning will be included in over 65 percent of edge use cases, up from less than 10 percent in 2021.” Typically these solutions have not fallen into Enterprise IT – at least not yet. It’s expected Edge Management becomes an IT focus by utilizing IT resources to optimize cost.

Take a look on Solita AI Masterclass for Executives how we can help you to bring business cases in life and you might be interested taking control of your fleet with our kickstart. Let’s stay fresh minded !

SQL Santa for Factory and Fleet

Awesome SQL Is coming To Town

December 2, 2021

Risto Saari Solution Architect

Timo 'Tipi' Pitkänen Data Engineer

We have a miniseries before Christmas coming where we talk S-Q-L, /ˈsiːkwəl/ “sequel”. Yes, the 47 years old domain-specific language used in programming and designed for managing data. It’s very nice to see how old faithful SQL is going stronger than ever for stream processing as well the original relational database management purposes.

What is data then and how that should be used ? Take a look on article written in Finnish “Data ei ole öljyä, se on lantaa”

We will show you how to query and manipulate data across different solutions using the same SQL programming language.

The Solita Developer survey has become a tradition here at Solita and please check out the latest survey. It’s easy to see how SQL is dominating in a pool of many cool programming languages. It might take an average learner about two to three weeks to master the basic concepts of SQL and this is exactly what we will do with you.

Data modeling and real-time data

Operative technology (OT) solution have been real time from day one despite it’s also a question of illusion of real-time when it comes to IT systems. We could say that having network latency 5-15 ms towards Cloud and data processing with single-digit millisecond latency irrespective of the scale is considered near real time. This is important for Santa Claus and Industry 4.0 where autonomous fleet, robots and real-time processing in automation and control is a must have. Imagine situation where Santa’s autonomous sleigh with smart safety systems boosted computer vision (CV) able bypass airplanes and make smart decisions would have time of unit seconds or minutes – that would be a nightmare.

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities.

It’s easy to identify at least conceptual, logical and physical data models, where from the last one we are interested the most in this exercise to store and query data.

Back to the Future

Dimensional model heavily development by Ralph Kimball was breakthrough 1996 and had concepts like fact tables, dimension and ultimately creating a star schema. Challenge of this modeling is to keep conformed dimensions across the data warehouse and data processing can create unnecessary complexity.

One of the main driving factors behind using Data Vault is for both audit and historical tracking purposes. This methodology was developed by Daniel (Dan) Linstedt in early 2000. It has gain a lot of attraction being able to support especially modern cloud platform with massive parallel processing (MPP) of data loading and not to worry so much of which entity should be loaded first. Possibility even create data warehouse from scratch and just loading data in is pretty powerful when designing an idempotent system.

Quite typical data flow looks like picture above and like you already noticed this will have impact on how fast data is landed into applications and users. Theses for Successful Modern Data Warehousing are useful to read when you have time.

Data Mesh ultimate promise is to eliminate the friction to deliver quality data for producers and enable consumers to discover, understand and use the data at rapid speed. You could imagine this as data products in own sandboxes with some common control plane and governance. In any case to be successful you need expertise from different areas such as business, domain and data. End of the day Data Mesh does not take a strong position on data modeling.

Wide Tables / One Big Table (OBT) that is basically nested and denormalized tables is one modeling that is perhaps the mostly controversy. Shuffling data between compute instances when executing joins will have negative impact on performance (yes, you can e.g. replicate dimensional data to nodes and keep fact table distributed which will improve performance) and very often operational data structures produced by micro-services and exchanged over API are closer to this “nested” structure. Having same structure and logic for batch SQL as streaming SQL will ease your work.

Breaking down the OT data items to multiple different sub optimal data strictures inside IT systems will loose the single, atomic data entity. Having said this it’s possible to ingest e.g. Avro files to MPP, keeping the structure same as original file as and using evolving schemas to discovery new attributes. That can be then use as baseline to load target layers such as Data Vault.

One interesting concept called Activity Schema that is sold us as being designed to make data modeling and analysis substantially simpler, faster.

Contextualize data

For our industrial Santa Claus case one very important thing is how to create inventory and contextualize data. One very promising path is an augmented data catalog that will cover a bit later. For some reason there is material out there explaining how IoT data has no structure which is just incorrect. The only reason I can think is that kind of data asset was not fit to traditional data warehouse thinking.

Something to take a look is Apache Avro that is a language-neutral data serialization system, developed by Doug Cutting, the father of Hadoop. The other one is JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. This is not solution for data modeling even more you will notice later on this blog post how those are very valuable on steaming data and having schema compared to other formats like CSV.

Business case for Santa

Like always everything starts with Why and solution discovery phase, what we actual want to build and would that have a business value. At Christmas time our business is around gifts and how to deliver those on time. Our model is a bit more simplified and will include operational technology systems such as assets (Santa’s workshop) and fleet (sleighs) operations. There might always be something broken so few maintenance needs are pushed to technicians (elfs). Distributed data platform is used for supply chain and logistics analytics to remove bottlenecks so business owners can be satisfied (Santa Claus and the team) and all gifts will be delivered to the right address just in time.

Case Santa’s workshop

We can later use OEE to calculate that workshop performance in order to produce high quality nice gifts. Data is ingested real time and contextualized so once a while Santa and the team will check how we are doing. In this specific case we know that using Athena we can find relevant production line data just querying the S3 bucket where all raw data is stored already.

Day 1 – creating a Santa’s table for time series data

Let’s create a very basic table to capture all data from Santa’s factory floor. You will notice there are different data types like bigint and string. You can even add comments to help others to later find what kind of data field should include. In this case raw data is Avro but you do not have to worry about that so let’s go.

CREATE EXTERNAL TABLE `raw`(

`seriesid` string COMMENT 'from deserializer',

`timeinseconds` bigint COMMENT 'from deserializer',

`offsetinnanos` bigint COMMENT 'from deserializer',

`quality` string COMMENT 'from deserializer',

`doublevalue` double COMMENT 'from deserializer',

`stringvalue` string COMMENT 'from deserializer',

`integervalue` int COMMENT 'from deserializer',

`booleanvalue` boolean COMMENT 'from deserializer',

`jsonvalue` string COMMENT 'from deserializer',

`recordversion` bigint COMMENT 'from deserializer'

) PARTITIONED BY (

`startyear` string, `startmonth` string,

`startday` string, `seriesbucket` string

)

Day 2 – query Santas’s data

Now we have a table and how to query that one ? That is easy with SELECT and taking all fields using asterix. It’s even possible to limit that to 10 rows which is always a good practice.

SELECT * FROM "sitewise_out"."raw" limit 10;

Day 3 – Creating a view from query

View is a virtual presentation of data that will help to organize assets more efficiently. One golden rule is still now to create many views on top of other views and keep the solution simple. You will notice that CREATE VIEW works nicely and now we have timeinseconds and actual factory floor value (doublevalue) captured. You can even drop the view using DROP command.

CREATE OR REPLACE VIEW "v_santa_data"

AS SELECT timeinseconds, doublevalue FROM "sitewise_out"."raw" limit 10;

Day 4 – Using functions to format dates to Santa

You noticed that timeinseconds is in Epoch so let’s use functions to have more human readable output. So we add a small from_unixtime function and combine that with date_format to have formatted output like we want. Perfect, now we know from which data Santa Claus manufacturing data originated.

SELECT date_format(from_unixtime(timeinseconds),'%Y-%m-%dT%H:%i:%sZ') , doublevalue FROM "sitewise_out"."raw" limit 10;

Day 5 – CTAS creating a table

Using CTAS (CREATE TABLE AS SELECT) you can even create a new physical table easily. You will notice that Athena specific format has been added that you do not need on relational databases.

CREATE TABLE IF NOT EXISTS new_table_name

WITH (format='Avro') AS

SELECT timeinseconds , doublevalue FROM "sitewise_out"."raw" limit 10;

Day 6 – Limit the result sets

Now I want to limit the results to only those where the quality is Good.Adding a WHERE clause I can have only those rows printed to my output – that is cool!

SELECT * FROM "sitewise_out"."raw"  where quality='GOOD' limit 10;

Case Santa’s fleet

Now we jump into Santa’s fleet meaning sleights and there is few attribute we are interested like SleightD , IsSmartLock, LastGPSTime , SleightStateID , Latitude and Longitude. This data is time series that is ingested into our platform near real-time. Let’s use AWS Timestream service which is fast, scalable, and serverless time series database service for IoT and operational applications. A time series is a data set that tracks a sample over time.

Day 7 – creating a table for fleet

You will notice very quickly that data model looks different than on relational database cases. There is no need beforehand to define table structure just executing CreateTable is enough.

Day 8- query the latest record

You can override time field using e.g. LastGPSTime, in this example we use time when data was ingested in, so getting the last movement of sleigh would be like this.

SELECT * FROM movementdb.tbl_movement
ORDER BY time DESC
LIMIT 1

Day 9- let’s check the last 24 hours movement

We can use time to filter our results and ordering on descending same time.

SELECT *
FROM "movementdb"."tbl_movement" 
WHERE time > ago(24h) 
ORDER BY time DESC

Day 10- latitude and longitude

We can find out latitude and longitude information easily and please note we are using IN operator to bet both to query result.

SELECT measure_name,measure_value::double,time 
FROM "movementdb"."tbl_movement" 
WHERE time > ago(24h) 
and measure_name in ('Longitude','Latitude')
ORDER BY time DESC LIMIT 10

Day 11- last connectivity info

Now we use 2 things so we group data based on sleigh id and find the maximum value. This will tell when sleigh was connected and sending data to our platform. There are plenty of functions to choose from so please check documentation.

SELECT greatest (time) as last_time, sleighId
FROM "movementdb"."tbl_movement" 
WHERE time > ago(24h) 
and measure_name = ('LastGPSTime')
group by sleighId,greatest (time)

Day 12- using conditions for smart lock data

CASE is very powerful to manipulate the query results so in this example we use that do indicate better if sleigh had smart lock.

SELECT time, measure_name,
CASE 
WHEN measure_value::boolean = true THEN 'Yes we have a smart lock'
ELSE 'No we do not that kind of fancy locks'
END AS smart_lock_info
FROM "movementdb"."tbl_movement"
WHERE time between ago(1d) and now() 
and measure_name='IsSmartLock'

Day 13- finding the latest battery level on each fleet equipment

This would be a bit more complex so we have one query to find max value of battery level and then we later join that to base data so on each record we know the latest battery level in the past 24 hours. Please notice we are using INNER join in this example.

WITH latest_battery_time as (
select 
d_sleighIdentifier, 
max(time) as latest_time 
FROM 
"movementdb"."tbl_movement" 
WHERE 
time between ago(1d) 
and now() 
and measure_name = 'Battery' 
group by 
d_sleighIdentifier
) 
SELECT 
b.d_sleighIdentifier, 
b.measure_value :: double as last_battery_level 
FROM 
latest_battery_time a 
inner join "movementdb"."tbl_movement" b on a.d_sleighIdentifier = b.d_sleighIdentifier 
and b.time = a.latest_time 
WHERE 
b.time between ago(1d) 
and now() 
and b.measure_name = 'Battery'

Day 14- distinct values

The SELECT DISTINCT statement is used to return only distinct (different) values. This is so create and also very misused when removing duplicates etc. when actual problem can be on JOIN conditions.

SELECT 
DISTINCT (d_sleighIdentifier) 
FROM 
"movementdb"."tbl_movement"

Day 15- partition by is almost magic

The PARTITION BY clause is a subclause of the OVER clause. The PARTITION BY clause divides a query’s result set into partitions. The window function is operated on each partition separately and recalculate for each partition. This is almost a magic and that can be used in several ways like in this example identify last sleigh Id.

select 
d_sleighIdentifier, 
SUM(1) as total, 
from 
(
SELECT 
*, 
first_value(d_sleighIdentifier) over (
partition by d_sleighTypeName 
order by 
time desc
) lastaction 
FROM 
"movementdb"."tbl_movement" 
WHERE 
time between ago(1d) 
and now()
) 
GROUP BY 
d_sleighIdentifier, 
lastaction

Day 16- interpolation (values of missing data points)

Timestream and few other IoT services supports linear interpolation, enabling to estimate and retrieve the values of missing data points in their time series data. This will come very handy when our fleet is not connected all the time, in this example we used it for our smart sleight battery level.

WITH rawseries as (
select 
measure_value :: bigint as value, 
time as d_time 
from 
"movementdb"."tbl_movement" 
where 
measure_name = 'Battery'
), 
interpolate as (
SELECT 
INTERPOLATE_LINEAR(
CREATE_TIME_SERIES(d_time, value), 
SEQUENCE(
min(d_time), 
max(d_time), 
1s
)
) AS linear_ts 
FROM 
rawseries
) 
SELECT 
time, 
value 
FROM 
interpolate CROSS 
JOIN UNNEST(linear_ts)

Case Santa’s master data

Now we jump into Master Data when factory and fleet is up are covered. In this very complex supply chain system customer data is very typical transactional data and in this exercise we keep it very atomic having stored only very basic info into DynamoDB that is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. We use this data to on IoT data streams for join, filtering and other purposes in fast manner. Good to remember that DynamoDB is not build for complex query patterns so it’s best on it’s original key=value data query pattern.

Day 17- adding master data

We upload our customer data into DynamoDB so called “items” based om the list received from Santa.

{
"customer_id": {
"S": "AJUUUUIIIOS"
},
"category_list": {
"L": [
{
"S": "Local Businesses"
},
{
"S": "Restaurants"
}
]
},
"homepage_url": {
"S": "it would be here"
},
"founded_year": {
"N": "2021"
},
"contract": {
"S": "NOPE"
},
"country_code": {
"S": "FI"
},
"name": {
"S": ""
},
"market_stringset": {
"SS": [
"Health",
"Wellness"
]
}
}

Day 18- query one customer item

Amazon DynamoDB supports PartiQL, a SQL-compatible query language, to select, insert, update, and delete data in Amazon DynamoDB. That is something we will use too speed up things. Let’s first query one customer data asset.

SELECT * FROM "tbl_customer" where customer_id='AJUUUUIIIOS'

Day 18- update kids information

Using the same PartiQL you can update item to have new attributes with one go.

UPDATE "tbl_customer" 
SET kids='2 kids and one dog' 
where customer_id='AJUUUUIIIOS'

Day 19- contains function

Now we can easily check that form marketing data who was interested on Health using CONTAINS. Many moderns database engines have native support for semi-structured data, including: Flexible-schema data types for loading semi-structured data without transformation. If you are not already familiar please take a look on AWS Redshift and Snowflake.

SELECT * FROM "tbl_customer" where contains("market_stringset", 'Health')

Day 20- inserting a new customer

Using familiar SQL like it’s very straightforward to add one new item.

INSERT INTO "tbl_customer" value {'name' : 'name here','customer_id' : 'A784738H'}

Day 21- missing data

Using a special MISSING you can find those where some attribute is not present easily.

SELECT * FROM "tbl_customer" WHERE "kids" is MISSING

Day 22- export data into s3

With one command you can export data from DynamoDB to S3 so let’s do that one based on documentation. AWS and others do have support for something called Federated Query where you can run SQL queries across data stored in relational, non-relational, object, and custom data sources. This federated feature we will cover later with You.

Day 23- using S3 select feature

Now you have data stored to S3 bucket and there is holder called /data so you can even use SQL to query S3 stored data. This will find relevant information for customer_id.

Select s.Item.customer_id from S3Object s

Day 24- s3 select to find right customer

You can even use customer Id to restrict data returned to you.

Select s.Item.customer_id from S3Object s where s.Item.customer_id.S ='AJUUUUIIIOS'

That’s all, I hope you get some glimpse how useful SQL is even you have different services and you might first think this will never be possible to use same kind of language of choice. Please do remember when some day You might be building next generation artificial intelligence and analysis platform with us knowing few data modeling techniques and SQL is a very good start.

You might be interested Industrial equipment data at scale for factory floor or manage your fleet at scale so let’s keep fresh mind and have a very nice week !

The Industrial Revolution 6.0

Strength of will, determination, perseverance, and acting rationally in the face of adversity

November 25, 2021

Risto Saari Solution Architect

Marko Siitonen IoT Architect

The Industrial Revolution

The European Commission has taken a very active role to define Industry 5.0 and it complements Industry 4.0 for transformation of sustainable, human-centric and resilient European industry.

Industry 5.0 provides a vision of industry that aims beyond efficiency and productivity as the sole goals, and reinforces the role and the contribution of industry to society. https://ec.europa.eu/info/research-and-innovation/research-area/industrial-research-and-innovation/industry-50_en

Finnish industry is affected by the pandemic, the fragmentation global supply chains and dependency of suppliers all around the world. Finnish have something called “sisu”. It’s a Finnish term that can be roughly translated into English as strength of will, determination, perseverance, and acting rationally in the face of adversity. That might be one reason why in Finland group of people are already defining Industry 6.0 and also one of the reasons we wanted to share our ideas using blog posts such as:

It’s not well defined where the boundaries on each industrial revolution really are. We can argue that first Industry 1.0 was around 1760 when transition to new manufacturing processes using water and steam was happening. Roughly 1840 the second industrial revolution was referred to as “The Technological Revolution” where one component was superior electrical technology which allowed for even greater production. Industry 3.0 introduced more automated systems onto the assembly line to perform human tasks, i.e. using Programmable Logic Controllers (PLC).

Present

The Fourth Industrial Revolution (Industry 4.0) will incorporate storage systems and production facilities that can autonomously exchange information. How to deliverer and purchase any service or product will have on these 3 dimensions two categories: physical and digital.

IoT has a bit of inflation as a word and the few biggest hype cycles are past life- which is a good thing. The Internet of things (IoT) plays very important role to enable smart connected devices and extend the possibility to Cloud computing. Companies are already creating cyber-physical systems where machine learning (ML) is built into product-centered thinking. Few of the companies have a digital twin that serves as the real-time digital counterpart of a physical object or process.

In Finland with a long history of factory, process and manufacturing companies this is reality and bigger companies are targeting for faster time to market, quality and efficiency. Rigid SAP processes combined with yearly budgets are not blocking future looking products and services – we are past that time. There are great initiatives for sensor networks and edge computing for environment analysis. Software enabled intelligent products, new better offerings based on real usage and how to differentiate on market is everyday business to many of us in the industrial domain.

Future

“When something is important enough, you do it even if the odds are not in your favor.” – Elon Musk

World events have pushed industry to rethink how to build and grow business in a sustainable manner. Industry 5.0 is being said to be the revolution in which man and machine reconcile and find ways to work together to improve the means and efficiency of production. Being on stage or watching your fellow colleagues you can hear words like human-machine co-creative resilience, mass-customization, sustainability and circular economy. Product complexity is increasing at the same time with ever-increasing customer expectations.

Industry 6.0 exists only in whitepapers but that does not mean that “customer driven virtualized antifragile manufacturing” could be real some day. Hyperconnected factories and dynamic supply chains would most probably benefit all of us. Some are referring to industrial change same way as hyperscalers such as AWS are doing for selling cloud capacity. There are challenges for sure like “Lot Size One” to be economically feasible. One thing is for sure that all models and things will merge, blur and convergence.

Building the builders

“My biggest mistake is probably weighing too much on someone’s talent and not someone’s personality. I think it matters whether someone has a good heart.” – Elon Musk

One fact is that industrial life is not super interesting for millennials. It looks old fashioned so to have a future professional is a must have. Factory floor might not be as interesting as it was a few decades ago. Technology possibilities and cloud computing will boost to have more different people to have interest towards industrial solutions. A lot of ecosystems exist with little collaboration and we think it’s time to change that by reinventing business models, solutions and onboarding more fresh minded people for industrial solutions.

That is one reason we have packaged kickstarts to our customers and anyone interested can grow with us.

Industrial equipment data at scale – Kickstart for factory
Manage your fleet at scale – Kickstart for fleet
Accelerate cloud data transformation – Kickstart for cloud data transformation

Illusion of real-time

Magic is the only honest profession. A magician promises to deceive you and he does.

November 17, 2021

Risto Saari Solution Architect

Cloud data transformation

Tipi shared thoughts on how data assets could be utilized on Cloud. We had few question after blog post and one of those was “how to tackle real time requirements ?

Let’s go real time ?

Real-time business intelligence is a concept describing the process of delivering business intelligence or information about business operations as they occur. Real time means near to zero latency and access to information whenever it is required.

We all remember those nightly batch loads and preprocessing data – waiting a few hours before data is ready for reports. Someone is looking if sales numbers are dropped and the manager will ask quality reports from production. Report is evidence to some other team what is happening in our business.

Let’s go back to the definition that says “information whenever it is required” so actually for some of the team(s) even one week or day can be realtime. Business processes and humans are not software robots so taking action based on any data will take more than a few milliseconds so where is this real time requirement coming from ?

Marko had a nice article related to OT systems and Factory Floor and Edge computing. Any factory issue can be a major pain and downtime is not an option and explained how most of the data assets like metrics and logs must be available immediately in order to recover and understand the root cause.

Hyperscalers and real time computing

In March 2005, Google acquired the web statistics analysis program Urchin, later known as Google Analytics. That was one of the customer facing solutions to gather massive amount of data. Industrial protocols like Modbus from 1970 was designed to work real time on that time and era. Generally speaking real time computing has three categories:

Hard – missing a deadline is a total system failure.
Firm – infrequent deadline misses are tolerable, but may degrade the system’s quality of service. The usefulness of a result is zero after its deadline.
Soft – the usefulness of a result degrades after its deadline, thereby degrading the system’s quality of service.

So it’s easy to understand that airplane turbine and rolling 12 months sales forecast have different requirements. .

What is the cost of (data) delay ?

“A small boat that sails the river is better than a large ship that sinks in the sea.”― Matshona Dhliwayo

We can simply estimate the value a specific feature would bring in after its launch and multiply this value with the time it will take to build. That will tell the economic impact that postponing a task will have.

High performing teams can do cost of delay estimation to understand which task should take first. Can we calculate and understand the cost of delayed data? How much that will cost to your organization if service or product must be postponed because you are missing data or you can not use it.

Start defining real-time

You can easily start discussing what kind of data is needed to improve customer experience. Real time requirements might be different for each use case and that is totally fine. It’s a good practice to specify near real time requirements in factual numbers and few examples. It’s good to remember that end to end can have totally different meanings. Working with OT systems for example the term First Mile is used when protect and connect OT systems with IT.

Any equipment failure must be visible to technicians at site in less than 60 seconds. ― Customer requirement

Understand team topologies

Incorrect team topology can block any near real time use cases. That means that adding each component and team deliverable to work together might end up having unexpected data delays. Or in the worst case scenario a team is built too much around one product / feature that will have come a bottleneck later when building more new services.

Data as a product refers to an idea where the job of the data team is to provide the data that the company needs. Data as a Service team partners with stakeholders and have more functional experience and are responsible for providing insight as opposed to rows and columns. Data Mesh is about the logical and physical interconnections of the data from producers through to consumers.

Team topologies will have a huge impact on how data driven services are built and can data land to business case purposes just on the right time.

Enable Edge streaming and APIs capabilities

On cloud services like AWS Kinesis is great, it is a scalable and durable real-time data streaming service that can continuously capture gigabytes of data per second. Apache Kafka is a framework implementation of a software bus using stream-processing. Apache Spark is an open-source unified analytics engine for large-scale data processing.

I am sure that at least one of these you are already familiar with. In order to control data flow we have two parameters: amount of messages and time. Which will come first will se served.

Is your data solution idempotent and able to handle data delays ? ― Customer requirement

Modern purpose-built databases have capability to process streaming data. Any extra layer of data modeling will add a delay for data consumption. On Edge we typically run purpose-built robust database services in order to capture all factory floor events with industry standard data models.

Site and Cloud API is a contact between different parties and will improve connectivity and collaboration. API calls on Edge works nicely and you can have data available in less than 70-300ms from Cloud endpoint (example below). Same data is available on Edge endpoint where client response is even faster so building factory floor applications is easy.

curl --location --request GET 'https://data.iotsitewise.eu-west-1.amazonaws.com/properties/history?assetId=aa&maxResults=1&propertyId=pp --header 'X-Amz-Date: 20211118T152104Z' --header 'Authorization: AWS4-HMAC-SHA256 Credential=xxx, SignedHeaders=host;x-amz-date, Signature=xxxx

Quite many databases has built-in Data API. It’s still good to remember that underlying engine, data model and many factors will determine how scalable solution really is.

AWS GreenGrass StreamManager is a component that enables you to process data streams to transfer to the AWS Cloud from Greengrass core devices. Other services like Firehose is supported using specific aws.greengrass.KinesisFirehose component. These components will support also building Machine Learning (ML) features on Edge as well.

Conclusion

Business case will define the requirement of real time. Build your near real time capabilities according to your future proof architecture – adding real time capabilities later might come almost impossible.

If business case is not clear enough what should I do ? Maybe a cup of tea, relax and read blog post from Johannes The gap between design thinking and business impact

You might be interested our kickstarts Accelerate cloud data transformation and Industrial equipment data at scale

Let’s stay fresh-minded !

Cloud data transformation

Data silos and unpredicted costs preventing innovation

November 12, 2021

Risto Saari Solution Architect

Timo 'Tipi' Pitkänen Data Engineer

Cloud database race ?

One of the first cloud services was S3 launched in 2006. AWS Hadoop based Amazon SimpleDB was released in 2007 and after that there have been many nice cloud database products from multiple cloud hyperscalers. Database as a service (DBaaS) has been a prominent service when customers are looking for scaling, simplicity and taking advantage of the ecosystem. It has been estimated that the Cloud database and DBaaS market was estimated to be USD 12,540 Million by 2020, so no wonder there is a lot of activity. Looking from a customer point of view this is excellent news when the cloud database service race is on and new features are popping up and same time usage costs are getting lower. I can not remember the time when creating a global solution backed by a database would be so cost efficient as it is now.

Why should I move data assets to the Cloud ?

There are few obvious reasons like rapid setup, cost efficiency, scaling solutions and integration to other Cloud services. That will give nice security enforcement in many cases where old school username and password is not used like in some on premises systems still do.

“No need to maintain private data centers”, “No need to guess capacity”

Cloud computing instead typical on premises setup is distributed by nature, so computing and storage are separated. Data replication to other regions is supported out of the box in many solutions, so data can be stored as close as possible to end users for best in class user experience.

In the last few years even more database service can work seamlessly with on premises and cloud. Almost all data related cases have aspects of machine learning nowadays and Cloud empowers teams to enable machine learning in several different ways: in built into database services, purpose-built services or using native integrations. Just using the same development environment and using industry standard SQL you can do all ML phases easily. Database integrated AutoML aims to empower developers to create sophisticated ML models without having to deal with all the phases of ML – that is a great opportunity for any Citizen data scientist !

Purpose build databases to support diverse data models

Beauty of cloud comes rapidly with flexibility and pay as you go model with very real time cost monitoring. You can cherry pick the best purpose-built database (relational, key-value, document, in-memory, graph, time series, wide column, and ledger databases.) to suit your use case, data models and avoid building one big monolithic solution.

Snowflake is one of the few enterprise-ready cloud data warehouses that brings simplicity without sacrificing features and can be operated on any major cloud platform. Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale to any relational database in the cloud. Amazon Timestream is a nice option for serverless, super fast time series processing and near real time solutions. You might have a Hadoop system or running a non-scalable relational database on premises and think about how to get started on a journey for improved customer experience and digital services?

Success for your cloud data migration

We have worked with our customers to build a Data Migration strategy. That will help in understanding the migration options, create a plan and also validate future proof architecture.

Today we share with you here a few tips that might help you when planning data migrations.

Employee experience – embrace your team, new possibilities and replace pure technical approach to include commitment from your team developers. Domain knowledge of data assets and applications is very important and building a trust to new solutions from day one.
Challenge your partner of choice. There is more than lift and shift or creating all from scratch options. It might be that all data assets are not needed or useful anymore. Our team is working on a vertical slicing approach where the elephant is splitted to manageable pieces. Using state of the art accelerator solutions we can make an inventory using real life metrics. Let’s make sure that you can avoid the big bang and current systems can operate without impact even when building new systems.
Bad design and technical debt of legacy systems. It’s very typical that old systems’ performance and design can be broken already. That is something which is not visible to all stakeholders and when doing the first Cloud transformation all that will come visible will pop up. Prepare yourself for surprises – take that as an opportunity to build more robust architecture. Do not try to fix all problems at once !
Automation to the bones. In order to be able to try and replay data make sure everything is fully automated including database, data loading and integrations. So, making a change is fun and not something to be careful of. It’s very hard to build DataOps to on premises systems because of the nature of operating models, contracts and hardware limitations. In Cloud those are not the blockers anymore.
Define workloads and scope ( no low hanging fruits only) . Taking one database and moving that to the Cloud can not be used as any baseline when you have hundreds of databases. Metrics from the first one should not be used as a matrix multiplied by the amount of databases when thinking about the whole project scope. Take a variety of different workloads and solutions, some even the hard one to first sprint. It’s better to start immediately and not wait for any target systems because on Cloud that is totally redundant.
Welcome Ops model improvement. On Cloud database metrics of performance (and any other kind) and audit trails are all visible so creating a more proactive and risk free ops model is at your fingertips. My advice is not to copy the existing Ops model with the current SLA as it is. High availability and recovery are different things – so do not mix those.
Going for meta driven DW. In some cases choosing state of the art automated warehouse like Solita Agile Data Engine (ADE) will boost your business goals when you are ready to take a next step.

Let’s kick the Cloud Data transformation ongoing !

Take advantage of cloud when building digital services with less money and faster with our Accelerate cloud data transformation kickstart

You might be interested also Migrating to the cloud isn’t difficult, but how to do it right?

Industrial data contextualization at scale

Shaping the future of your data culture with contextualization

November 7, 2021

Risto Saari Solution Architect

My colleague and good friend Marko had interesting thought on Smart and Connected factories and how to get data out of the complex factory floor systems and enable machine learning capabilities on Edge and Cloud . In this blog post I will try to open a bit more on data modeling and how to overcome a few typical pitfalls – that are not always only data related.

Creating super powers

Research and development (R&D) include activities that companies undertake to innovate and introduce new products and services. In many cases if company is big enough R&D is separate from other units and in some cases R is separated from D as well. We could call this as separation of concerns – so every unit can 100% focus on their goals.

What separates R&D and Business unit ? Let’s first pause and think about what business is doing. A business unit is an organizational structure such as a department or team that produces revenues and is responsible for costs. Perfect so now we have company wide functions (R&D, business) to support being innovative and produce revenue.

Hmmm, something is still missing – how to scale digital solutions in a cost efficient way so we can have profit (row80) in good shape ? Way back in 1978 information technology (IT) was used first time. The Merriam-Webster Dictionary defines information technology as “the technology involving the development, maintenance, and use of computer systems, software, and networks for the processing and distribution of data.” One the IT functions is to provide services with cost efficiency on global scale.

Combine these super powers: business, R&D and IT we should produce revenue, be innovative and have the latest IT systems up and running to support company goals – in real life this is much more complex, welcome to the era of data driven product and services.

Understanding your organization structure

To be data driven, the first thing is to actually look around in which maturity level my team and company is. There are so many nice models to choose from: functional, divisional, matrix, team, and networking. Organizational structure can easily become a blocker in how to get new ideas to market quickly enough. Quite many times Conway’s law kicks in and software or automated systems end up “shaped like” the organizational structure they are designed in or designed for.

One example of Conway’s law in action, identified back in 1999 by UX expert Nigel Bevan, is corporate website design: Companies tend to create websites with structure and content that mirror the company’s internal concerns

When you look at your car dashboard, company web sites or circuit board of embedded systems, quite many times you can see Conway’s law in action. Feature teams, tribes, platform teams, enabler team or a component team – I am sure you have at least one of these to somehow try to tackle the problem of how an organization should be able to produce good enough products and services to market on time. Calling same thing with Squad(s) will not solve the core issue. Neither to copy one top-down driven model from Netflix to your industrial landscape.

Why does data contextualization matter?

Based on facts mentioned above, creating industrial data driven services is not easy. Imagine you push a product out to the market that is not able to gather data from usage. Other team is building a subscription based service for the same customers. Maybe someone already started to sell that to customers. This solution will not work because now we have a product out and not able to invoice customers from usage. Refactoring of organizations, code and platforms is needed to accomplish common goals together. A new Data Platform as such is not improving the speed of development automatically or making customers more engaged.

Contextualization means adding related information to any data in order to make it more useful. That does not mean data lake, our new CRM or MES. Industrial data is not just another data source on slides, creating contextual data enables to have the same language between different parties such as business and IT.

A great solution will help you understand better what we have or how things work, it’s like a car you have never driven and still you feel that this is exactly how it should be even if it’s not close to your old vehicle at all. Industrial data assets are modeled in a certain way and that will enable common data models from floor to cloud, enabling scalable machine learning without varying data schema changes.

Our industrial AWS SiteWise data models for example are 100% compatible with modern data warehousing platforms like Solita Agile Data Engine out of the box. General blueprints of data models have failed in this industry many times, so please always look at your use case also from bottom up and not only the other way round.

Curiosity and open minded

I have been working on data for the last 20 years and on the industrial landscape half of that time. Now it’s great to see how Nordics companies are embracing company culture change, talking about competence based organization, asking from consultants more than just a pair of hands and creating teams of superpowers.

How to get started on data contextualization ?

Gather your team and check how much of time it will take to have one idea to customer (production) – is our current organization model supporting it ?
Look models and approach that you might find useful like intro for data mesh or a deep dive – the new paradigm you might want to mess with (and remember that what works for someone else might not be perfect to you)
We can help with with AWS SiteWise for data contextualization. That specific service is used to create virtual representations of your industrial operation with AWS IoT SiteWise assets.

I have been working on all major cloud platforms and focusing on AWS. Stay tuned for the next Blog post explaining how SiteWise is used for data contextualization. Let’s keep in touch and stay fresh minded.

Our Industrial data contextualization at scale Kickstart

Microchip powered smart products

Artificial Intelligence – where machines are smarter

GPU-Accelerated Cloud Services

Top 3 benefits of using Edge for computing

Convergence of AI and Industrial IoT Solutions

Data modeling and real-time data

Back to the Future

Contextualize data

Business case for Santa

Case Santa’s workshop

Day 1 – creating a Santa’s table for time series data

Day 2 – query Santas’s data

Day 3 – Creating a view from query

Day 4 – Using functions to format dates to Santa

Day 5 – CTAS creating a table

Day 6 – Limit the result sets

Case Santa’s fleet

Day 7 – creating a table for fleet

Day 8- query the latest record

Day 9- let’s check the last 24 hours movement

Day 10- latitude and longitude

Day 11- last connectivity info

Day 12- using conditions for smart lock data

Day 13- finding the latest battery level on each fleet equipment

Day 14- distinct values

Day 15- partition by is almost magic

Day 16- interpolation (values of missing data points)

Case Santa’s master data

Day 17- adding master data

Day 18- query one customer item

Day 18- update kids information

Day 19- contains function

Day 20- inserting a new customer

Day 21- missing data

Day 22- export data into s3

Day 23- using S3 select feature

Day 24- s3 select to find right customer

Building the builders

Industrial equipment data at scale – Kickstart for factory

Manage your fleet at scale – Kickstart for fleet

Accelerate cloud data transformation – Kickstart for cloud data transformation

Cloud data transformation

Let’s go real time ?

Hyperscalers and real time computing

What is the cost of (data) delay ?

Conclusion

Cloud database race ?

How to get started on data contextualization ?