Friday, 3 November 2017

Vector of Differentiation is Shifting Again.


Have we reached peak phone? That is, does the new iPhone X represent a plateau for hardware innovation in the smartphone product category?
I would argue that we are indeed standing on the summit of peak “phone as hardware”: While Apple’s newest iPhone offers some impressive hardware features, it does not represent the beginning of the next 10 years of the smartphone, as Apple claims.
The better question, then, is where do phones go from here?
To understand the future of phones, it helps to look at the history of phone innovation. We have seen this movie before. When focal dimensions of innovation change, incumbents often get left behind. More specifically, as we shift from hardware-based innovation to differentiation around AI-driven technologies, market leaders like Apple should be on high alert.
Innovation in technology product categories tends to proceed along a specific dimension—a “vector of differentiation.”
Players pursue innovation along a vector of differentiation until the vector runs out of steam. This happens for two reasons: limits to innovation along the vector of focus and the ability of competitors to catch up with market leaders. When that happens, the focus of innovation shifts to a different vector and new market leaders emerge. We have seen this pattern several times in mobile phone innovation over the past three decades.
The physical dimensions of the phone constituted the first vector of mobile phone differentiation. Phones were big, clunky bricks of technology until the small, sleek StarTac arrived in 1996, establishing Motorola as a market leader, soon joined by Nokia. Innovation continued along this dimension, with phones getting smaller and smaller.
With the advent of the Blackberry, Palm device, and others in the mid- to late-1990s, the emphasis shifted to data capabilities, especially email and text messaging. Consumers loved those button-based keyboards, and Blackberry snatched the crown of market leadership from Nokia and Motorola.
Fast-forward to 2007, when the vector of differentiation shifted once again with the debut of Apple’s iPhone. Now it was about display and apps. In a revolutionary move, Apple eliminated the physical keyboard to maximize real estate for glass. It also created the App Store, a thriving ecosystem of applications that contributed to Apple’s breathtaking market success. RIM, the maker of Blackberry, was never able to make the transition to a media-centric phone and slipped into oblivion.
Innovation along the display and media vector led to larger and larger phones, ironically the reverse of the “make it smaller” innovation in the earliest mobile phones. Recent models like Samsung’s Galaxy S8 Plus have pushed the limit of this dimension—a return to brick-worthy phone surface area—and we have seen the rise of bezel-less phones (those with no borders around the screen) and those with more powerful processors, better displays, and more powerful cameras. Samsung and Apple have led innovation in this space.
Now, the vector of differentiation is shifting yet again, away from hardware altogether. We are on the verge of a major shift in the phone and device space, from hardware as the focus to artificial intelligence (AI) and AI-based software and agents.
This means nothing short of redefinition of the personal electronics that matter most to us. As AI-driven phones like Google’s Pixel 2 and virtual agents like Amazon Echo proliferate, smart devices that understand and interact with us and offer a virtual and/or augmented reality will become a larger part of our environment. Today’s smartphones will likely recede into the background.
As we have seen, when the vector of differentiation shifts, market leaders tend to fall by the wayside. In the brave new world of AI, Google and Amazon have the clear edge over Apple. Consider Google’s Pixel 2 phone: Driven by AI-based technology, it offers unprecedented photo-enhancement features and deeper hardware-software integration, such as real-time language translation when used with Google’s special headphones.
Similarly, the Amazon Echo enables natural conversations through the Alexa virtual agent. Next-generation devices will use AI and deep learning to recognize our voices, faces, and emotions. We will move from touch to touchless interactions and we will move from software apps to AI-powered skills. Just like the App Store, Amazon has created an Alexa skills store for third parties to offer skills that enable the Echo to do everything from set your kitchen temperature to play Jeopardy with you. “There’s a skill for that” will apply to more and more consumer needs.
The shifting vector of differentiation to AI and agents does not bode well for Apple.
The advent of Amazon’s skill store and similar innovations speak to the need to create an AI-rich ecosystem where hardware, software, and third-party contributors work in concert to enhance consumer experience across life domains. Amazon is making rapid progress along this vector of differentiation, as are Google (with its TensorFlow open-source platform for AI apps) and even Microsoft.
Sheets of glass are simply no longer the most fertile ground for innovation. That means Apple urgently needs to shift its focus and investment to AI-driven technologies, as part of a broader effort to create the kind of ecosystem Amazon and Google are building quickly. However, Apple is falling behind in the AI race, as it remains a hardware company at its core and it has not embraced the open-source and collaborative approach that Google and Amazon are pioneering in AI.
The history of mobile phones suggests that when vectors of differentiation shift, so does market leadership. Apple has only to look at former dominant businesses like Motorola, Nokia, and Blackberry to understand how quickly a leader can fall from the peak in this market, and do its best to avert this outcome.
Mohanbir Sawhney is the McCormick Foundation professor of technology at the Kellogg School of Management at Northwestern University. He has no investments of the companies mentioned in this article.

Wednesday, 1 November 2017

With Great Data Comes Great Responsibility

We all know the phrase “with great power comes great responsibility”, what most people are unaware of is that data is the new world power.
All over the world, across multiple industries, businesses and individuals alike are beginning to harness the power that good data can unlock.
Data is however a tool, if applied correctly you wield the power to revolutionize your business. Used incorrectly it can have disastrous effects. It therefore is your responsibility to be the master of your data.

The Golden Record – your weapon of choice

Big data seems to be the buzz word at the moment, with almost infinite ways to collect data and even more ways to apply it, it’s easy to feel overwhelmed. However, fear not because the way you manage this data can turn it from a garbage heap to a goldmine with relative ease.
Within the world of data management, there is a concept called ‘The Golden Record’. According to whatis, the golden record is a single, well-defined version of all the data entities in an organizational ecosystem. It is also known as a single version of the truth (Single Customer View), as it gives you a single record with the purest and most complete set of data.
By making sense of your data to create a Single customer view and creating a golden record for each database entry, you will be well on your way to data mastery, thus ensuring you are prepared for the new GDPR regulations that are coming into force in May 2018. Data cleansing software or master data management software is often the first port of call in achieving this.

Forging the Golden Record

In order to create a single golden record, duplicate records must be matched and merged into a single place. This involves various intelligent processes.
First let’s consider the process of matching.
Various smart rules must be set in place so that software can iterate through the database and identify which records are actually duplicate entries. The most common example which causes error in a system is when a person is entered into a database twice. Each entry however, with slight variations of the same name.
Consider the instance below:
With the use of fuzzy matching software and various other intelligent processes it becomes clear that these two records are in fact the same person. Rob is an abbreviation of Robert, the last name is Smith in both cases, the first five digits of each phone number are the same and the postcodes are very similar. Good matching software will have a wide variety of intelligent tools available in order to identify matches in the database.
Now that the records are matched, the next step in the process is Merging.
In order to create the coveted “Golden Record”, a single accurate version of the two records above must be created.
This step, like matching also requires intelligent rules and workflows to assess which record is the most trustworthy. For example, we can see that the first five digits of each phone number are the same however only record 1 is complete so it is likely that the phone number from record 1 is correct. This is the phone number that will be used in the “Golden Record”.
Likewise the software will know that Rob is an abbreviation of Robert, so Robert would be used.
Lastly, because both postcodes are in the correct format, a series of Verification, Validation and Authentication steps could be employed in order to work out which postcode is correct. However, for the sake of this example we will say that the owner of the data base trusts record 1 because it is from a more trusted source. Thus the postcode and home number from record 1 will be used in the “Golden Record”.
After these rules have been applied, the records above are merged into one:
This is known as the “Golden Record” it is complete and accurate.
With all of your records in this format, you will thus unlock your data allowing numerous benefits such as: increased customer satisfaction, less wasted money, reports can be created on accurate data, GDPR Compliance and finally trustworthy decisions can be made based on trustworthy data. With the onset of the new GDPR regulation, businesses have a major responsibility to comply. We don’t see this as a negative, we see this as an opportunity to take responsibility of your data and get it in order.

@Martin

Data Warehouse and Data Lake Analytics Collaboration

This blog was written with the thoughtful assistance of David Leibowitz, Dell EMC Director of Business Intelligence, Analytics & Big Data
So data warehousing may not be cool anymore, you say? It’s yesterday’s technology (or 1990’s technology if you’re as old as me) that served yesterday’s business needs. And while it’s true that recent big data and data science technologies, architectures and methodologies seems to have rendered data warehousing to the back burner, it is entirely false that there is not a critical role for the data warehouse and Business Intelligence in digitally transformed organizations.
Maybe the best way to understand today’s role of the data warehouse is with a bit of history. And please excuse us if we take a bit of liberty with history (since we were there for most of this!).

Phase 1: The Data Warehouse Era

Phase 1: In the beginning, Gods (Ralph Kimble and Bill Inmon, depending upon your data warehouse religious beliefs) created the data warehouse. And it was good. The data warehouse, coupled with Business Intelligence (BI) tools, served the management and operational reporting needs of the organization so that executives and line-of-business managers could quickly and easily understand the status of the business, identify opportunities, and highlight potential areas of under-performance (see Figure 1).
Figure 1: The Data Warehouse Era
The data warehouse served as a central integration point; collecting, cleansing and aggregating a variety of data sources from AS/400, relational and file based (such as EDI). For the first time, data from supply chain, warehouse management, AP/AR, HR, point of sale was available in a “single version of the truth.”
Using extraction-transform-load (ETL) processing wasn’t always quick, and could require a degree of technical gymnastics to bring together all of these disparate data sources. At one point, the “enterprise service bus” entered the playing field to lighten the load on ETL maintenance, but routines quickly went from proprietary data sources, to proprietary (and sometimes arcane) middleware business logic code (anyone remember Monk?).
The data warehouse supported reports and interactive dashboards that enabled business management to have a full grasp on the state of the business. That said, report authoring was static and not really enabled for democratizing data. Typically, the nascent concept of self-service BI was limited to cloning a subset of the data warehouse to smaller data marts, and extracts to Excel for business analysis purposes. This proliferation of additional data silos created reporting environments that were out of sync (remember the heated sales meetings where teams couldn’t agree as to which report figures were correct?) and the analysis paralysis caused by spreadmarts meant that more time was spent working the data rather than driving insight. But we all dealt with it, as it was agreed that some information (no matter the effort it took to acquire) was more important that no data.

Phase 2: Optimize the Data Warehouse

But IT man grew unhappy with being held captive by proprietary data warehouse vendors. The costs of proprietary software and expensive hardware (and let’s not even get started on user-defined functions in PL/SQL and proprietary SQL extensions that created architectural lock-in) forced organizations to limit the amount and granularity of data in the data warehouse. IT Man grew restless and looked for ways to reduce the costs associated with operating these proprietary data warehouses while delivering more value to Business Man.
Then Hadoop was born out of the ultra-cool and hip labs of Yahoo. Hadoop provided a low-cost data management platform that leveraged commodity hardware and open sources software that was an estimated to be 20x to 100x cheaper than proprietary data warehouses.
Man soon realized the financial and operational benefits afforded by a commodity-based, natively parallel, open source Hadoop platform to provide an Operational Data Store (now that’s really going old school!) to off-load those nasty Extract Load and Transform (ETL) processes off the expensive data warehouse (see Figure 2).
Figure 2: Optimize the Data Warehouse

The Hadoop-based Operational Data Store was deemed very good as it helped IT Man to decrease spending on the data warehouse (guess not so good if you were a vendor of those proprietary data warehouse solutions…and you know who you are T-man!). Since it’s estimated that ETL consumes 60% to 90% of the data warehouse processing cycles, and since some vendors licensed their products based upon those cycles – this concept of “ETL Offload” could provide substantial cost reductions. So in an environment limited by Service Level Agreements (because outside of Doc Brown’s DeLorean equipped with a flux capacitor, there’s still only 24 hours in a day in which to do all the ETL work), Hadoop provided a low-cost, high-performance environment for dramatically slowing the investment in proprietary data warehouse platforms.
Things were getting better, but still weren’t perfect. While IT Man could shave costs, he couldn’t make the tools easy to use by simple data consumers (like Executive Man). And while Hadoop was great for storing unstructured and semi-structured data, it couldn’t always keep up to the speed relied upon for relational or cube based reporting from traditional transactional systems.

Phase 3: Introducing Data Science

Then God created the Data Scientists, or maybe it was the Devil based upon one’s perspective. The data scientists needed an environment where they could rapidly ingest high volumes of granular structured (tables), semi-structured (log files) and unstructured data (text, video, images). They realized that data beyond the firewall was needed in order to drive intelligent insight. Data such as weather, social, sensor and third party could be mashed up with the traditional data stores in the EDW and Hadoop to determine customer insight, customer behavior and product effectiveness. This made Marketing Man happy. The scientists needed an environment where they could quickly test new data sources, new data transformations and enrichments, and new analytic techniques in search of those variables and metrics that might be better predictors of business and operational performance. Thusly, the analytic sandbox, which also runs on Hadoop, was born (see Figure 3).

Figure 3: Introducing Data Science

The characteristics of a data science “sandbox” couldn’t be more different than the characteristics of a data warehouse:

Finance Man tried desperately to combine these two environments but the audiences, responsibilities and business outcomes were just too varying to create an cost-effectively business reporting and predictive analytics in single bubble.
Ultimately, the analytic sandbox became one of the drivers for the creation of the data lake that could support both the data science and data warehousing (Operational Data Store) needs.
Data access was getting better for the data scientists but we again were moving towards proprietary process and a technical skill reserved for the elite. Still, things were good as IT Man, Finance Man and Marketing Man could work through the data scientists to drive innovation. But they soon wanted more.

Phase 4: Creating Actionable Dashboards

But Executive Man was still unsatisfied. The Data Scientists were developing wonderful predictions about what was likely to happen and prescriptions about what to do, but the promise of self-service BI was missing. Instead of the old days, and having to run to IT Man for reports, now he was requesting them of the Data Scientist.
The reports and dashboards created to support executive and front-line management in Stage 1 were the natural channel for rendering the predictive and prescriptive insights, effectively closing the loop between the data warehouse and the data lake. With data visualization tools like Tableau and Power BI, IT Man could finally deliver on the promise of self-service BI by providing interactive descriptive and predictive dashboards that even Executive Man could operate (see Figure 4).

Figure 4: Closing the Analytics Loop

And Man was happy (until the advent of Terminator robots began making decisions for us).

Which machine learning algorithm should I use?

A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is “which algorithm should I use?” The answer to the question varies depending on many factors, including:
  • The size, quality, and nature of data.
  • The available computational time.
  • The urgency of the task.
  • What you want to do with the data.
Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms. We are not advocating a one and done approach, but we do hope to provide some guidance on which algorithms to try first depending on some clear factors.
The machine learning algorithm cheat sheet
Click on the picture below to zoom in. 
To read more, click here
The article describes when using one of the following algorithms:
  • Linear regression and Logistic regression 
  • Linear SVM and kernel SVM
  • Trees and ensemble trees
  • Neural networks and deep learning
  • k-means/k-modes, GMM (Gaussian mixture model) clustering
  • DBSCAN
  • Hierarchical clustering
  • PCA, SVD and LDA
  • Ref. Hui