Introduction to Edge AI with HPE Ezmeral Data Fabric

In this blog, we will be talking about how technology has shifted from on-premises data centers to the cloud and from cloud to edge. Then, we will explain data fabric, introduce HPE Ezmeral Data Fabric and investigate its capabilities. Finally, we will talk about Edge AI with HPE Ezmeral Data Fabric.

To see what Edge AI is, we need to take a deeper look at the history of data processing over time.

The evolutions of data-intensive workloads

On-premises data centers

Back in 2000, almost everything was running locally in on-premises data centers. This means that everything from management to maintenance was on the company’s shoulders. It was fine but over time, when everything was getting more dependent on the internet, businesses faced some challenges. Here are some of the most important ones:

Infrastructure inflexibility

Over time, many new services and technologies are released and it should be taken into consideration that there might be a need to update the infrastructure or apply some changes to the services. 

This can be challenging when it comes to hardware changes. The only solution seems to be purchasing the desirable hardware, then manual configuration. It can be worse if, at some point, we realize that the new changes are not beneficial. In this case, we have to start all over again! 

This inflexibility causes wasting money and energy.

How about scaling on demand

A good business invests a lot of money to satisfy its customers. It can be seen from different angles but one of the most important ones always has the capacity to respond to the clients as soon as possible. This rule is also applied to the digital world: even loyal customers might change their minds if they see that the servers are not responding due to reaching their maximum capacity.

Therefore, there should be an estimation of the demand. The challenging part of this estimation is when this demand goes very high on some days during the year and one should forecast it. This demand forecasting has many aspects and it is not limited to the digital traffic from clients to servers. Having a good estimation of the demand for a particular item in the inventory is highly valuable.

Black Friday is a good example of such a situation. 

There are two ways to cope with this unusual high demand: 

  1. Purchase extra hardware to ensure that there will be no delay in responding to the customers’ requests. This strategy seems to be safe, but it has some disadvantages. First, since the demand is high on only certain days, many resources are in idle mode for a long time. Second, the manual configuration of the newly purchased devices should be considered. All in all, it is not a wise decision financially.
  2. Ignore that demand and let customer experience the downtime and wait for servers to become available. As it is easy to guess, it is not good for the reputation of the business.

This inflexibility is hard to address, and it gets worst over time. 

Expansion 

One might want to expand the business geographically. Along with marketing, there are some technical challenges. 

The issue with the geographical expansion is the delay that is caused by the physical distance between the clients and servers. A good strategy is to distribute the data centers around the world and locate them somewhere closer to the customers.

The configuration of these new data centers along with the security, networking, and data management might be very hard.

Cloud Computing

Having the challenges of the on-premises data centers, the first evolution of data-intensive workloads happened around 2010 when third-party cloud providers such as Amazon Web Services and Microsoft Azure were introduced. 

They provided companies with the infrastructure/services with the pay-as-you-go approach. 

Cloud Computing solved many problems with on-premises approaches. 

Risto and Timo have a great blog post about “Cloud Data Transformation” and I recommend checking it out to know more about the advantages of Cloud Computing.

Edge Computing

Over time, more applications have been developed, and Cloud Computing seemed to be the proper solution for them, but around 2020 Edge Computing got more and more attention as the solution for a group of newly-introduced applications that were more challenging. 

The common feature of these applications was being time-sensitive.  Cloud computing might act poorly in such cases since the data transmission to the cloud is time-consuming itself. 

The basic idea of Edge Computing is to process data close to where it is produced. This decentralization has some benefits such as:

Reducing latency

As discussed earlier, the main advantage of Edge Computing is that it reduces the latency by eliminating the data transmission between its source and cloud.

Saving Network Bandwidth 

Since the data is being processed in Edge Nodes, the network bandwidth can be saved. This matters a lot when the stream of data needs to be processed.

Privacy-preserving

Another essential advantage of Edge Computing is that the data does not need to leave its source. Therefore, it can be used in some applications where sending data to the cloud/on-perm data centers is not aligned with regulations.

AI applications

Many real-world use cases in the industry were introduced along with the advances in Artificial Intelligence. 

There are two options for deploying the models: Cloud-based AI and Edge AI. There is also another categorization for training the model (centralized and decentralized) but it is beyond the scope of this blog.

Cloud-based AI

With this approach, everything happens in the cloud, from data gathering to training and deploying the model.

Cloud-based AI has many advantages, such as being cost-saving. It would be much cheaper to use cloud infrastructure for training a model rather than purchasing the physical GPU-enabled computers.

The workflow of such an application is that after the model is deployed, new unseen data from the business unit (or wherever the source of data is) will be sent to the cloud, the decision will be made there and it will be sent back to the business unit.

Edge AI

As you might have guessed, Edge AI addresses the time-sensitivity issue. This time, the data gathering and training of the model steps still happen in the cloud, but the model will be deployed on the edge nodes. This change in the workflow not only saves the network bandwidth but also reduces the latency. 

Edge AI opens the doors to many real-time AI-driven applications in the industry. Here are some examples: 

  • Autonomous Vehicles
  • Traffic Management Systems
  • Healthcare systems
  • Digital Twins

Data Fabric

So far, we have discussed a bit about the concepts of Cloud/Edge computing, but as always, the story is different in real-world applications.

We talked about the benefits of cloud computing but it is important to ask these questions ourselves:

  • What would be the architecture of having such services in the Cloud/Edge?
  • What is the process of migration from on-prem to cloud? What are the challenges? How can we solve them? 
  • How can we manage and access data in a unified manner to avoid data silos?
  • How can we orchestrate distributed servers or edge nodes in an optimized and secure way?
  • How about monitoring and visualization?

Many companies came up with their own solutions for the above questions with manual work but there is a need for a better way for a business to focus on creating values, rather than dealing with these issues. This is when Data Fabric comes into the game. 

Data Fabric is an approach for managing data in an organization. Its architecture consists of a set of services that make accessing data easier regardless of its location (on-prem, cloud, edge). This architecture is flexible, secure, and adaptive.

Data Fabric can reduce the integration time, the maintenance time, and the deployment time. 

Next, we will be talking about the HPE Ezmeral Data Fabric (Data Fabric is offered as a solution by many enterprises and the comparison between them is beyond the scope of this blog).

HPE Ezmeral Data Fabric

HPE Ezmeral Data Fabric is an Edge to Cloud solution that supports industry-standard APIs such as REST, S3, POSIX, and NFS. It also has an ecosystem package that contains many open-source tools such as Apache Spark and allows you to do data analysis. 

You can find more information about the benefits of using HPE Ezmeral Data Fabric here.

As you can see, there is an eye-catching part named “Data Fabric Event Stream”. This is the key feature that allows us to develop Edge AI applications with the HPE Ezmeral Data Fabric.

Edge AI with HPE Ezmeral Data Fabric – application

An Edge AI application should contain at least one platform for orchestrating the broker cluster such as Kafka, some tools such as Apache Spark, and a data store. This might not be as easy as it seems, especially in large-scale applications when we have millions of sensors, thousands of edge sites, and the cloud. 

Fortunately, with HPE Ezmeral Data Fabric Event Stream, this task can be done much easier. We will go through it by demonstrating a simple application that we developed. 

Once you set up the cluster, the only thing you need to do is to install the client on the edge nodes, connect them to the cluster (by a simple line maprlogin command), and then enable the services that you want to use. 

For the event stream, it is already there, and again it just needs a single command for creating a stream and then creating topics in it.

For the publisher (also called producer), you need to just send the data from any source to the broker, and for the subscriber (also called consumer) the story is the same.

For using open-source tools such as Apache Spark (or in our case Spark Structure Streaming), you just need to install them on the mapr client, and the connection between the client and the cluster will be automatically established. So you can run a script in edge nodes and access data in the cluster.

Storing data is again as simple as the previous ones. The table creation can be done with a single command, and storing it is also straightforward.

Conclusion

To sum up, Edge AI has a promising future, and leveraging it with different tools such as Data Fabric can be a game changer.

Thank you for reading this blog! I would also like to invite you to our talk about the benefits of Edge Computing in Pori on 23/09/2022!

More information can be found here.

Sadaf Nazari.

MLOps: from data scientist’s computer to production

MLOps refers to the concept of automating the lifecycle of machine learning models from data preparation and model building to production deployment and maintenance. MLOps is not only some machine learning platform or technology, but instead it requires an entire change in the mindset of developing machine learning models towards best practises of software development. In this blog post we introduce this concept and its benefits for anyone having or planning to have machine learning models running in production.

Operationalizing data platforms, DataOps, has been among the hottest topics during the past few years. Recently, also MLOps has become one of the hottest topics in the field of data science and machine learning. Building operational data platforms has made data available for analytics purposes and enabled development of machine learning models in a completely new scale. While development of machine learning models has expanded, the processes of maintaining and managing the models have not followed in the same pace. This is where the concept of MLOps becomes relevant.

What is MLOps?

Machine learning operations, or MLOps, is a similar concept as DevOps (or DataOps), but specifically tailored to needs of data science and more specifically machine learning. DevOps was introduced to software development over a decade ago. DevOps practices aim to improve application delivery by combining the entire life cycle of the application – development, testing and delivery – to one process, instead of having a separate development team handing over the developed solution for the operations team to deploy. The definite benefits of DevOps are shorter development cycles, increased deployment velocity, and dependable releases.

Similarly as DevOps aims to improve application delivery, MLOps aims to productionalize machine learning models in a simple and automated way.

As for any software service running in production, automating the build and deployment of ML models is equally important. Additionally, machine learning models benefit from versioning and monitoring, and the ability to retrain and deploy new versions of the model, not only to be more reliable when data is updated but also from the transparency and AI ethics perspective.

Why do you need MLOps?

Data scientists’ work is research and development, and requires essentially skills from statistics and mathematics, as well as programming. It is iterative work of building and training to generate various models. Many teams have data scientists who can build state-of-the-art models, but their process for building and deploying those models can be entirely manual. It might happen locally, on a personal laptop with copies of data and the end product might be a csv file or powerpoint slides. These types of experiments don’t usually create much business value if they never go live to production. And that’s where data scientists in many cases struggle the most, since engineering and operations skills are not often data scientists’ core competences.

In the best case scenario in this type of development the model ends up in production by a data scientist handing over the trained model artifacts to the ops team to deploy, whereas the ops team might lack knowledge on how to best integrate machine learning models into their existing systems. After deployment, the model’s predictions and actions might not be tracked, and model performance degradation and other model behavioral drifts can not be detected. In the best case scenario your data scientist monitors model performance manually and manually retrains the model with new data, with always a manual handover again in deployment.

The described process might work for a short time when you only have a few models and a few data scientists, but it is not scalable in the long term. The disconnection between development and operations is what DevOps originally was developed to solve, and the lack of monitoring and re-deployment is where MLOps comes in.

ML model development lifecycle. The process consists of development, training, packaging and deploying, automating and managing and monitoring.

 

How can MLOps help?

Instead of going back-and-forth between the data scientists and operations team, by integrating MLOps into the development process one could enable quicker cycles of deployment and optimization of algorithms, without always requiring a huge effort when adding new algorithms to production or updating existing ones.

MLOps can be divided into multiple practices: automated infrastructure building, versioning important parts of data science experiments and models, deployments (packaging, continuous integration and continuous delivery), security and monitoring.

Versioning

In software development projects it is typical that source code, its configurations and also infrastructure code are versioned. Tracking and controlling changes to the code enables roll-backs to previous versions in case of failures and helps developers to understand the evolution of the solution. In data science projects source code and infrastructure are important to version as well, but in addition to them, there are other parts that need to be versioned, too.

Typically a data scientist runs training jobs multiple times with different setups. For example hyperparameters and used features may vary between different runs and they affect the accuracy of the model. If the information about training data, hyperparameters, model itself and model accuracy with different combinations are not saved anywhere it might be hard to compare the models and choose the best one to deploy to production.

Templates and shared libraries

Data scientists might lack knowledge on infrastructure development or networking, but if there is a ready template and framework, they only need to adapt the steps of a process. Templating and using shared libraries frees time from data scientists so they can focus on their core expertise.

Existing templates and shared libraries that abstract underlying infrastructure, platforms and databases, will speed up building new machine learning models but will also help in on-boarding any new data scientists.

Project templates can automate the creation of infrastructure that is needed for running the preprocessing or training code. When for example building infrastructure is automated with Infrastructure as a code, it is easier to build different environments and be sure they’re similar. This usually means also infrastructure security practices are automated and they don’t vary from project to project.

Templates can also have scripts for packaging and deploying code. When the libraries used are mostly the same in different projects, those scripts very rarely need to be changed and data scientists don’t have to write them separately for every project.

Shared libraries mean less duplicate code and smaller chance of bugs in repeating tasks. They can also hide details about the database and platform from data scientists, when they can use ready made functions for, for instance, reading from and writing to database or saving the model. Versioning can be written into shared libraries and functions as well, which means it’s not up to the data scientist to remember which things need to be versioned.

Deployment pipeline

When deploying either a more traditional software solution or ML solution, the steps in the process are highly repetitive, but also error-prone. An automated deployment pipeline in CI/CD service can take care of packaging the code, running automated tests and deployment of the package to a selected environment. This will not only reduce the risk of errors in deployment but also free time from the deployment tasks to actual development work.

Tests are needed in deployment of machine learning models as in any software, including typical unit and integration tests of the system. In addition to those, you need to validate data and the model, and evaluate the quality of the trained model. Adding the necessary validation creates a bit more complexity and requires automation of steps that are manually done before deployment by data scientists to train and validate new models. You might need to deploy a multi-step pipeline to automatically retrain and deploy models, depending on your solution.

Monitoring

After the model is deployed to production some people might think it remains functional and decays like any traditional software system. In fact, machine learning models can decay in more ways than traditional software systems. In addition to monitoring the performance of the system, the performance of models themselves needs to be monitored as well. Because machine learning models make assumptions of real-world based on the data used for training the models, when the surrounding world changes, accuracy of the model may decrease. This is especially true for the models that try to model human behavior. Decreasing model accuracy means that the model needs to be retrained to reflect the surrounding world better and with monitoring the retraining is not done too seldom or often. By tracking summary statistics of your data and monitoring the performance of your model, you can send notifications or roll back when values deviate from the expectations made in the time of last model training.

Applying MLOps

Bringing MLOps thinking to the machine learning model development enables you to actually get your models to production if you are not there yet, makes your deployment cycles faster and more reliable, reduces manual effort and errors, and frees time from your data scientists from tasks that are not their core competences to actual model development work. Cloud providers (such as AWS, Azure or GCP) are especially good places to start implementing MLOps in small steps, with ready made software components you can use. Moreover, all the CPU / GPU that is needed for model training with pay as you go model.

If the maturity of your AI journey is still in early phase (PoCs don’t need heavy processes like this), robust development framework and pipeline infra might not be the highest priority. However, any effort invested in automating the development process from the early phase will pay back later and reduce the machine learning technical debt in the long run. Start small and change the way you develop ML models towards MLOps by at least moving the development work on top of version control, and automating the steps for retraining and deployment.

DevOps was born as a reaction to systematic organization needed around rapidly expanding software development, and now the same problems are faced in the field of machine learning. Take the needed steps towards MLOps, like done successfully with DevOps before.

Career opportunities

Greetings from the Bay Area – IBM Think 2019 Part 1

Our Solita crew participated in IBM Think held in San Francisco in February. IBM Think is an annual technology and business conference, where the latest technology trends and new product releases from IBM are introduced.

IBM Think in San Francisco was a huge technology event with approximately 27 000 attendees, thousands of different sessions, presentations and keynotes held in different venues in San Francisco.

Due to the size of the conference we wanted to focus on certain key areas: AI, machine learning and analytics. There were about 500 data and analytics presentations to choose from. Topics covered areas such as data science, AI, business and planning analytics, hybrid data management, governance and integration. IBM Cloud Private for Data alone had 18 sessions where this new product was presented.

Solita has a strong expertise in the area of analytics

Solita has a strong expertise in the area of analytics (Cognos Analytics & Planning Analytics) and we wanted to strengthen our competence and learn about upcoming releases of those products. We had a chance to meet IBM’s offering management and discuss new features and give feedback. There were also several hands-on labs where one could test upcoming features of products.

Although Planning Analytics (PA) was a bit of a sidekick compared to buzzwords like AI and Blockchain, the PA sessions provided good information about the new features and on-going development. In addition, there were several different client presentations providing insights into their CPM solutions. Interestingly, many of those presentations were still focusing on TM1 technology and not on Planning Analytics even though TM1 support will end on 30th of September 2019.

AI and data science were strongly present on IBM Think agenda. Success stories on AI implementations were told for example by Carrefour (retail chain who wanted to optimize existing and new supermarket investment decisions), Nedbank (bank that used predictive maintenance to optimize AMT services), Red Eléctrica de España (electrical company that wanted to predict generation and optimize production) and Daimler (truck manufacturer using AI to comprehend the complexity of product configurations).

Also AI project best practices were shared in many of the sessions.

Also AI project best practices were shared in many of the sessions. Best practices included starting with a quick-win use case to gain buy-in from management and business, having a business sponsor for the project, measuring clear KPIs and business impact and, good quality data, creating effective teams, choosing the right tools, etc. These are all principles we definitely agree on and that are already now implemented in Solita data projects.

What else did we learn in IBM Think 2019? Deep dive into learnings coming up!

Why are deep learning models so popular?

Deep learning (i.e. big neural networks) plays a central role in the ongoing boom of artificial intelligence and data science. Last year, a partly neural network based AI beat a human grandmaster for the first time in Go, a complex board game. Judging by the hype, it feels like deep neural networks can be found in every other state-of-the-art AI solution.

In practice they have their downsides, but although they are not the be-all-end-all of machine learning algorithms, neural networks are versatile and useful. In our client projects, we have leveraged deep learning in image recognition and multivariate time series forecasting tasks, for example. What qualities make neural networks efficient?

On a high level, a neural network (and most other supervised machine learning algorithms) can be seen as a device that takes in numerical inputs and spits out numerical outputs.

When the model is built, the general structure of what is inside the device, as well as the structure of its inputs and outputs, is specified. At this point, the model already “works” in the sense that it can take inputs to produce outputs, but the results are random.

Then, the model is trained by continuously feeding it with actual data, that is, correct answers to the problem at hand. During this training process the parameters of the model (the bolts and cogs inside the device) are adjusted in very small increments. In time, the algorithm converges and learns the relationships in the training data. In the end, the model learns how to map input data to outputs using a similar logic that underlies the training data it was fed. After training, the model can be used to make predictions based on new input data, something that the model has not seen before.

What goes in, what goes out?

The inputs to the device could be virtually anything that can be represented as arrays of numbers: images, time series data, videos, free text articles after being transformed into numerical representations, you name it. The outputs can also take various shapes.

The output could be a single number, say a weather forecast on a given hour. Or it could be an array of several figures, like the pixel coordinates of an identified suspected cancer in an x-ray image received as input.

The end result does not even have to be numeric, even though a neural network only crunches numbers. For example, the network can produce an array of likelihood estimates that are converted into a categorical classification in the end.

Given a picture, the model could say it is a cat with 80 % certainty, a dog with 15 % certainty or a car with 5% certainty. Although almost anything can be represented in numerical format in some manner, deciding how the numerical representations of the inputs and outputs are actually done is usually not easy. This preprocessing step is an important part of the data science workflow.

Anatomy of a neural network

In the case of neural networks, what is inside the device is a large amount of interconnected simple processing units called neurons. A neuron takes a number, squeezes it through some non-linear function and then outputs the result. In a deep neural network, the neurons are organized into layers that succeed each other. The input signal is first sent to the first layer of neurons, which send their outputs to all the neurons in the next layer. This process continues layer after layer until the output layer is reached. The construct is inspired by biological neurons, the main components of the central nervous system.

The connections between neurons are also given so called weights, which are basically little valves that determine how much of each input the unit propagates up the network. Adjusting these weights is what actually happens during the training process and what allows neural networks to fit specific problems. A deep neural network could contain millions of weights, so the good news is that adjusting the weights can be automated efficiently (using backpropagation and gradient descent methods).

Advantages of neural networks

The idea of training a machine to transform numerical representations of inputs to outputs applies to most machine learning models, so what makes neural networks work special? Three reasons come to mind.

First, the structure of a neural network is specified only very broadly before the model is trained, which gives a lot of room for the model to adjust during training.

In statistical terms, large neural networks can be thought of as being somewhere in between parametric and nonparametric models. In a parametric model, for instance a traditional regression, the number of parameters in the model is strictly determined before fitting the model. In a nonparametric model, the model structure is determined more broadly, and the training process can adjust the number of parameters as well as their values. Thus, in a nonparametric model, there is more freedom for the model structure to adjust to the problem being solved. In neural networks, the number of parameters (weights) is strictly determined beforehand, which would imply that they are parametric models. However, the number of weights can be enormous and the training process could allow many of the weights to zero out, effectively blocking certain paths through the network. For these reasons, deep and broad neural networks resemble nonparametric models in practice. The nonparametric nature gives neural networks structural freedom to adapt to many kinds of problems.

Second, since neural networks consist of chained little functions that perform nonlinear transformations at each step, they are inherently nonlinear models.

This allows them to model many problems better since many real-world relationships are nonlinear. (For example, the area of a square shaped field increases exponentially instead of linearly as its width increases.)

Third, the vanilla version of a neural network that has been discussed so far can be adjusted to make it a better fit to certain problems.

Convolutional neural networks, for example, are good at making broad abstractions from very detailed inputs and work especially well for image recognition problems. In recurrent neural networks, the neuron layers have feedback loops, which essentially means that the network is able to remember previous inputs. This trait makes them a good fit for time series forecasting and natural language processing.

Advanced deep learning models – the ones that are used in solutions that are able to beat humans in complex games or drive vehicles – combine these basic architectures. For instance, the model could begin with convolutional layers that are good at abstracting information. This network could be followed by a recurrent network that has a memory and the ability to learn sequential and spatial relationships. Finally, a regular fully connected layer could produce the final output.

The downsides

If deep learning is so powerful, then why don’t we dump all other machine learning algorithms in their favor? One of the biggest caveats of neural networks is the fact that they are black boxes: it is usually impossible to intuitively explain why a neural network has given a certain prediction. Sometimes, the intermediate outputs that the neurons produce can be analyzed and explained, but many times this is not the case. Other algorithms, such as traditional linear models or tree-based models like random forest, can usually be analyzed and explained better.

Another downside is that neural networks need large amounts of training data and can take a long time to learn. This was a huge problem in the past but has been mitigated somewhat by technological advancement. One of the reasons for the surge in deep learning’s popularity can be attributed to improvements in GPU computational power and the advancements of cloud computing. Today, it is possible to train complex deep learning models in a matter of hours, not weeks.

How to get started?

Even though they are technically formidable when you dive into details, neural networks are not that hard to experiment with. A good way to get your hands dirty, if you know Python or R, is to first google and follow through hands-on coding tutorials, like this one. It is also a good idea to register into www.kaggle.com and build a neural network solution to one of the simpler problems, say, “Titanic: Machine Learning from Disaster”.

Once familiar with the basics, try something a little more advanced by following along these inspiring blog posts, for example: