the other day at Oracle OpenWorld.
it sponsor. A generation ago such decisions relied on a little bit of analysis
and a lot of gut instinct.
But here in 2012 our social media-powered
culture gives us hard data on which such multi-million dollar decisions can be
based. The problem today, however, is that most companies have too much data – 5 million Tweets and 27 million other data points to be exact.
that data to find the ideal athlete to endorse Lexis. (Ellison also uncovered
an impressive showing for Blackberry as a source of Tweets – although he did
couch that observation with a disclaimer.)
Oracle Big Data as we begin to investigate below.
Problem with Traditional Data Analytics:
Traditional data analytics techniques depend on the structured data from
limited sources which do not have a high rate of volume nor a high rate of
creation. In plain English, they tend to be based on small data sets produced
either slowly or over a long time.
and maturation of digital technology has changed that. Systems are now capable
of producing and storing high a volume data along with very high rate of
creation from various sources. Further complicating things is the fact that
this data is coming in a non-structured format like e-mail, social networking
sites, sensors, video etc.
example illustrates, companies can analyzed Big Data to make marketing,
business strategy, customer management etc. decisions.
Why We Need Big Data:
- Non-structured data along with
structured data must be analysed using right set of tools to find critical
information about the relationship, usage and patterns important for business. - Data is collected through refining of
models, process, make a hypothesis, create statistical, visual, or semantic
models, validation and refine hypothesis theories. - The real challenge in big data is to
manage the 3V’s (volume, velocity and variety) of sources along with structured
data with new generation of analytics. - It helps to store, share, visualize
and analyse dataset with sizes ranging to petabytes. - Big Data helps in analysis of
unstructured data by filtering low density data to produce high density data
with facts.
Data Requirements:
requires a platform where we can store and process data.
there are 4 phases:
-
Acquiring big data: The data from various sources must be
stored in a store which allows variety, volume, and low latency, high
transaction number long with flexible data structures. Oracle solution uses
NoSQL, HDFS and Enterprise Applications to acquire data - Organizing big data: Organizing
the data means marking the integration of data. Objective is to allow to
process and manipulate data at its original location with a system supporting
distributed, high volume processing. Apache Hadoop with Mapreduce and Oracle
Big data connectors can be used for organizing data. - Analysing big data: Analysing
the data may require to do processing on distributed set locally. Apache Hadoop
Map reduce can be generate intermediate results and populate them to
traditional data warehouses where data can be analysed further. Data warehouse
can also use in data analysis to generate results. - Decide: System can use various
analytic applications to draw conclusion based on the data from warehouse.
Data Appliances:
data appliance is a set of software and hardware. Oracle Big data uses
combination of Exalogic, Exadata and Exalytics along with required software to
provide needed capabilities for capturing, processing and analysing data. Oracle big data appliances include following:
- Cloudera’s www.cloudera.com distribution including Apache
Hadoop. - Cloudera
Manager to manage above. - Open source distribution of statistical package R (www.r-project.org ) for analysis of unfiltered
data on Big appliances. - Oracle
NoSQL database: It stores key value based data in a
distributed manner hiding underlying topology and data location at the same
time providing lowest latency for data request. The driver part of database
hides details and present simple to use interface/api for data access. Open
community edition is installed on Big Data Appliance integrated software. - Oracle
Enterprise Linux operating system with Oracle JVM.
integration of data set for analysing the data. You can install big data
connector on Big data appliances or Hadoop cluster to connect to 11g database.
the 4 connectors for big data.
- Oracle Loader for Hadoop: Used to load data from Hadoop
system to oracle 11g database. It uses efficient last step map reduce to generate native oracle format data which can be easily
used in SQL and analysis by different tools. - Oracle Direct connector for Hadoop distributed file system: This connector provides the access to HDFS from oracle 11g database. It allows the HDFS data
to be queried using SQL and joined with existing
tables in database. External tables based on HDFS can
be created in database and used for query and analysis purpose. - Oracle Data Integration Application
Adapter for Hadoop: This adapter is useful for integrating to
oracle database using oracle data interchange
standards. It is useful when you have existing Hadoop
system and want to integrate it with oracle database without going for oracle big data appliances. - Oracle R Connector for Hadoop: Used for connecting open source statistical environment running
R model and systems with Hadoop. Running R models
against large volume Hadoop becomes transparent and
user need not learn about any other system/API.
or exadata, next step is to use appropriate tools for analysis.
the tools which can be run for analysis purpose.
- Oracle R Enterprise : For running statistics based
models and analysis - Database Mining: Creating predictive models
based on high volume data with the help of BI tools used for predictive
analysis. - Text mining: Mining the text data from
various sources like social media, blogs, enterprise class software, net based
text data like search information to create analysis on usage, patterns and activities. - Semantic Analysis: For creating model with
relationship on data set and data points. - Spatial analysis: For creating dimension based
spatial models helping to understand relationship between data and dimensions. - Mapsreduce: For running complex patterns
findings on the database data.
to OBIEE….?
Ellison promised that Big Data would soon be in the reach of medium and even
small companies via Oracle’s commitment to the Cloud.
that moment hasn’t quite arrived yet.
you’re seeking a Business Intelligence solution that is a little easier to wrap
your hands around, we invite you to attend IT Convergence’s final Oracle
OpenWorld 2012 presentation: “Planning
and Executing an Upgrade to Oracle Business Intelligence Enterprise Edition
11g” (CON3812) to be held at 2:15 in the InterContinental-Sutter.
For
those of you who’re already heading home or simply can’t make it, fear not. You
can download copies of this presentation and all of our Oracle OpenWorld 2012
presentations at the IT Convergence Oracle OpenWorld 2012 resources page.
access all of the articles under our Oracle OpenWorld thread, click here.
You may Follow @IT_Convergence for all latest OOW ’12 Updates.