Contact us at +1.415.901.7500 or contact@itconvergence.com

Thoughts on Larry Ellison’s Big Data Demo

Thoughts on Larry Ellison’s Big Data Demo
Oracle founder and Chair Larry Ellison provided a Big Data demonstration
the other day at Oracle OpenWorld.

The results were impressive.
Lexus had a problem. I wanted to determine which Olympic athlete should
it sponsor. A generation ago such decisions relied on a little bit of analysis
and a lot of gut instinct.

But here in 2012 our social media-powered
culture gives us hard data on which such multi-million dollar decisions can be
based. The problem today, however, is that most companies have too much data – 5 million Tweets and 27 million other data points to be exact.

That’s where Ellison chose to show off Big Data, using it to sift through
that data to find the ideal athlete to endorse Lexis. (Ellison also uncovered
an impressive showing for Blackberry as a source of Tweets – although he did
couch that observation with a disclaimer.)
Ellison got everyone’s attention, but there’s a bigger story behind
Oracle Big Data as we begin to investigate below.
The
Problem with Traditional Data Analytics:

Traditional data analytics techniques depend on the structured data from
limited sources which do not have a high rate of volume nor a high rate of
creation. In plain English, they tend to be based on small data sets produced
either slowly or over a long time.
The internet
and maturation of digital technology has changed that. Systems are now capable
of producing and storing high a volume data along with very high rate of
creation from various sources. Further complicating things is the fact that
this data is coming in a non-structured format like e-mail, social networking
sites, sensors, video etc.
As Ellison’s
example illustrates, companies can analyzed Big Data to make marketing,
business strategy, customer management etc. decisions.

Why We Need Big Data:
  1. Non-structured data along with
    structured data must be analysed using right set of tools to find critical
    information about the relationship, usage and patterns important for business.
  2. Data is collected through refining of
    models, process, make a hypothesis, create statistical, visual, or semantic
    models, validation and refine hypothesis theories.
  3. The real challenge in big data is to
    manage the 3V’s (volume, velocity and variety) of sources along with structured
    data with new generation of analytics.
  4. It helps to store, share, visualize
    and analyse dataset with sizes ranging to petabytes.
  5. Big Data helps in analysis of
    unstructured data by filtering low density data to produce high density data
    with facts.
Big
Data Requirements:
Big Data
requires a platform where we can store and process data.
Typically,
there are 4 phases:

  1. Acquiring big data: The data from various sources must be
    stored in a store which allows variety, volume, and low latency, high
    transaction number long with flexible data structures. Oracle solution uses
    NoSQL, HDFS and Enterprise Applications to acquire data
  2. Organizing big data: Organizing
    the data means marking the integration of data. Objective is to allow to
    process and manipulate data at its original location with a system supporting
    distributed, high volume processing. Apache Hadoop with Mapreduce and Oracle
    Big data connectors can be used for organizing data.
  3. Analysing big data: Analysing
    the data may require to do processing on distributed set locally. Apache Hadoop
    Map reduce can be generate intermediate results and populate them to
    traditional data warehouses where data can be analysed further. Data warehouse
    can also use in data analysis to generate results.
  4. Decide: System can use various
    analytic applications to draw conclusion based on the data from warehouse.
Big
Data Appliances:
Oracle big
data appliance is a set of software and hardware. Oracle Big data uses
combination of Exalogic, Exadata and Exalytics along with required software to
provide needed capabilities for capturing, processing and analysing data.  Oracle big data appliances include following:
  1. Cloudera’s www.cloudera.com distribution including Apache
    Hadoop.
  2. Cloudera
    Manager
    to manage above.
  3. Open source distribution of statistical package R (www.r-project.org ) for analysis of unfiltered
    data on Big appliances.
  4. Oracle
    NoSQL database: 
    It stores key value based data in a
    distributed manner hiding underlying topology and data location at the same
    time providing lowest latency for data request. The driver part of database
    hides details and present simple to use interface/api for data access. Open
    community edition is installed on Big Data Appliance integrated software. 
  5. Oracle
    Enterprise Linux operating system
    with Oracle JVM.
Big data connectors: Oracle big data connectors enable
integration of data set for analysing the data. You can install big data
connector on Big data appliances or Hadoop cluster to connect to 11g database.
Following are
the 4 connectors for big data.
  1. Oracle Loader for Hadoop: Used to load data from Hadoop
    system to oracle 11g database. It uses efficient last step map reduce to generate native oracle format data which can be easily
         used in SQL and analysis by different tools.
  2. Oracle Direct connector for Hadoop distributed file system: This connector provides the access to HDFS from oracle 11g database. It allows the HDFS data
    to be queried using      SQL and joined with existing
    tables in database. External tables based on      HDFS can
    be created in database and used for query and analysis purpose. 
  3. Oracle Data Integration Application
         Adapter for Hadoop:
     This adapter is useful for integrating to
    oracle database using oracle data interchange
    standards. It is useful when you have existing Hadoop
    system and want to integrate it with oracle database without going for oracle big data appliances.
  4. Oracle R Connector for Hadoop: Used for connecting open source statistical environment running
    R model and systems with Hadoop. Running R models
    against large volume Hadoop becomes transparent and
    user need not learn about any other system/API.
Analytics: After loading data in database
or exadata, next step is to use appropriate tools for analysis.
Following are
the tools which can be run for analysis purpose.
  1. Oracle R Enterprise : For running statistics based
    models and analysis
  2. Database Mining: Creating predictive models
    based on high volume data with the help of BI tools used for predictive
    analysis.
  3. Text mining: Mining the text data from
    various sources like social media, blogs, enterprise class software, net based
    text data like search information to create analysis on usage, patterns and activities.
  4. Semantic Analysis: For creating model with
    relationship on data set and data points.
  5. Spatial analysis: For creating dimension based
    spatial models helping to understand relationship between data and dimensions.
  6. Mapsreduce: For running complex patterns
    findings on the database data.
From Big Data
to OBIEE….?
Larry
Ellison promised that Big Data would soon be in the reach of medium and even
small companies via Oracle’s commitment to the Cloud.
But
that moment hasn’t quite arrived yet.
If
you’re seeking a Business Intelligence solution that is a little easier to wrap
your hands around, we invite you to attend IT Convergence’s final Oracle
OpenWorld 2012 presentation:  “Planning
and Executing an Upgrade to Oracle Business Intelligence Enterprise Edition
11g” (CON3812) to be held at 2:15 in the InterContinental-Sutter.

For
those of you who’re already heading home or simply can’t make it, fear not. You
can download copies of this presentation and all of our Oracle OpenWorld 2012
presentations at the IT Convergence Oracle OpenWorld 2012 resources page.

To
access all of the articles under our Oracle OpenWorld thread, click here.

You may    for all latest OOW ’12 Updates.