Monday, May 19, 2014

How To Gain Business Value From Hadoop


Apache Hadoop is the hottest thing in Big Data Technology and is a software framework that allows distributed processing of larger data sets using a simple programming model across clusters of commodity servers  . This open source software platform is managed by Apache Software Foundation and was primarily named after a toy elephant. Its been very helpful in storing and managing vast amounts of data efficiently and cheaply . Hadoop is been designed to scale up from single to thousands of machines each offering local storage and computation unique to Hadoop and subprojects HDFS, MapReduce, Hive , Pig and Hbase .
Wondering what makes Hadoop special ??
The inspiration for hadoop came from Google’s work on MapReduce a programming model for distributed computing that allowed big data to be stored, managed and accessed from multiple servers . Doug Cutting then created Hadoop to manage data which is too big for conventional Databases .
The library has been designed with a high degree of fault tolerance, instead of relying on high-end hardware , the resiliency of clusters comes from its ability to handle failures at application level.
Two important qualities of Apache Hadoop is its Efficiency and Robustness . Big Data application will continue to run even when individual servers/clusters fail and it does not require application to shuttle large volumes of data across network .
If you look deeper than you will realise that Hadoop is also modular means you can swap any of its components for different software tool making the whole architecture flexible , robust and highly efficient.
One of the reasons for Hadoop’s popularity is because companies  like Google, Yahoo. Facebook and Amazon use it on huge data sets . It makes shorter works for larger tasks managing the big data flowing for these online giants. But they aren’t the only ones who can benefit from it, enterprises are also adopting it on larger scales. Technology is being flowing from big internet companies and adapting it for enterprise environment.
                  
How Hadoop Works ?
Unlike traditional applications and hardware architectures Hadoop applications are more like Batch jobs that transforms data from a database into a data warehouse . In general terms hadoop takes chunks of data , perform actions on it and handovers to  another application to utilize and all this takes place at massive scale . This scale is achieved using Distributed Computing i.e larger no. of commodity computers processing the data at the same time . The only way to achieve this type of computing in a quick timeframe is to distribute to thousands of commodity servers . These server nodes are grouped into racks and further into clusters all connected by a high speed network.

Hadoop has Two main Subprojects :-

1.  HDFS (Hadoop Database File System)
This component helps to manage the data, split the data , put it on different nodes and replicate it . HDFS spans all nodes in cluster for data storage and links the file system on local nodes to make them into one big system . It achieves reliability by replicating data across multiple nodes  .
HDFS runs on a cluster of nodes and can handle very large no. of files, it breaks larger files on multiple nodes for efficient storage.

                  

2.  MapReduce
This component understands and assigns work to nodes in cluster . It performs parallel data processing ,progress rate and calculates the result of the job .
Hadoop’s supplemented with an ecosystem of Apache Projects like Pig , Hive and Zookeeper that extends the overall value and improves usability .
A Relational Database is analyzed using queries (SQL) .Non-Relational databases also use queries but are not constrained to SQL and can use other queries to pull information from databases . But hadoop does not involve any queries and is more of a data warehousing system , so it needs a system like Mapreduce to process and analyze its data .
MapReduce provides an efficient and fast way of running queries over big data . Instead of copying a file for querying it runs them on same system on which data is stored , on reducing they are sent back to the user program .
MapReduce runs a series of Job each being a separate Java Application . There are two steps in MapReduce process - Map and Reduce . Suppose you require to count no. of blogs regarding a specific entry lets say Big data and want to count no. of times hadoop , bigdata are mentioned. So first the file will split on HDFS then all nodes will pass through the map computation for their datasets i.e counting the no. of times those words show up . After mapping the node output is a list of key-value pairs , this result is then sent to other nodes as inputs for reduce step. Before commencing the reduce step these key value pairs are shuffled and stored . The reduce step then sums up list into single entries for individual word .
          

3.  Job Tracker
Its one of the important components of Hadoop and is responsible to manage everything stated above . If manually we have to divide terabytes of data and copy it to thousands of computers then the process will take forever to kick job off . So there is  a set of components which automates this step . Entire process is a Job in Hadoop and Job Tracker divides each job into tasks and schedules them to run on nodes. It keeps tracks of all participating nodes , handles failures and monitors the process . Task Tracker reports to the Job tracker and runs tasks. With this arrangement hadoop can distribute jobs on larger number of nodes in parallel .                                  
Hadoop is most widely used system for managing large amounts of data and has surely changed the dynamics and economics of large scale computing .

So lets Boil down some of its powerful capabilities and salient characteristics -
1. Flexibility -
Hadoop can absorb any type of structured or unstructured data and is schema-less . Data from any no. of sources can be aggregated for deeper analysis in arbitrary ways .

2.  Scalability -
New nodes can be easily added without the need to change data formats or how jobs are written or data is loaded etc.  It doesn’t requires data to be changed in a different format like in a traditional data warehouse . Data is not lost in translation process in hadoop . It is a good framework that allows data analysts to choose when and how to perform data analysis.

3. Cost Efficiency -
It offers massive parallel computing which results in sizeable decrease in cost per terabyte of storage  and nodes can be added or removed as per project demands . This ability is driving organization to harness more data for projects which previously never made any business sense .

4.  Complex Data Analysis -
Hadoop considers complex and diverse data like images , videos , text , real-time feeds, devices, scientific sensors etc . While it is often used for petabytes of data many organization perform processing on terabytes scale , hadoop with its mapreduce framework can abstract the complexity of distributed parallel processing across multiple nodes providing huge benefits of scaling .

5.  Fault Tolerance  -
It is highly robust , it will continue to run even when clusters fail and even you lose a node the system redirects work onto another location without posing a break in processing .

Some important sectors where its powerful features can be leveraged include -
1.  Social Media Data   -   With hadoop you can mine social media data and conversions for real time and make respective decisions to increase market share . Its been used used to track media consumption and engagement , advertising, customer retention as well as operations . The Video gaming industry is one of the huge users of it to analyze performance and tracking during gameplay .
2 .  Clickstream Data   -   It can help a lot with customer segmentation making it easier to  visualize and understand how visitors behave on your website . This clickstream data can be used to help with conversions and to reduce bounces .
3 .  Server Log Data   -   Hadoop provides a low cost platform to analyze server logs by speeding and improving security forensics  .  With hadoop its easy to store , identify and refine patterns providing insights to ease out business decision process .
4 .  Financial & Automotive Industry   -   Financial sector uses Hadoop to analyze big scale investments and to make better financial decisions . In Automotive industry its been used to identify issues of travelling arrangements , lower maintenance costs, avoid collisions etc.
Hadoop’s biggest advantage is its speed , it is capable to generate comprehensive reports which would have otherwise taken weeks . Its possibilities in enterprises are endless and that’s why it will still remain the big elephant in Big Data room for some more time.
To discuss how we can help you, please contact with our team at info@oodlestechnologies.com  or skype : oodles.tech

Thursday, April 24, 2014

Cross Platform Mobile App Development


With the rising demands of smartphones and tablets ,mobile apps are also becoming ubiquitous. But do you target a single platform for mobile app development or make that extra effort to to build your device twice for Android and iOS , or prefer cross platform development ? So what the proliferation of these different devices actually mean for new developers entering the mobile market ?

You might have heard of something like “Cross Platform Development” in the mobile market but aren’t really sure about what it is or why to consider it for app development ? then this blog will shed some light on what it is, its benefits and reasons for considering cross platform development strategies.

But before going onto that lets first understand native and web applications with their intended pros and cons -

1.  Native Apps

Native Apps are the applications that are installed through an application store and are accessed through icons on device home screen . They are accessed and installed from Application store like Google Play or Apple’s App store etc. , they are developed for one platform and can access all device features like your contacts, GPS, camera etc.  Native apps are specifically developed for single platforms like Android and iOS and uses the respective development tools and languages they support like Java and Eclipse for Android & Xcode and Objective-C with iOS . Native apps can work offline and can even use device’s notification system.  They incorporate either standard operating system gestures or app defined gestures .

Native App Advantages

  • The greatest strength of native apps is their sheer power . Native apps easily make use of the device’s software and  built-in hardware features . Another important asset is that they can be used offline even when a user in not connected to their network .
Native App Disadvantages

  • Developing Native Apps is slightly cumbersome and time consuming as it requires the app to be developed for two different platforms (iOS & Android ). Moreover it requires developers to get their app approved from Google and Apple along with revenue sharing and licensing fees .

2.    Mobile Web Apps (HTML5 Apps)

Web Apps are really websites, but in look and feel resembles to that of native applications but aren’t implemented as such . These apps are written in HTML5 and are run in web browser. They are accessed like any other web page like by navigating to special URL and then installing them on home screen by bookmarking that page . HTML5 apps use standard web technologies—typically HTML5, JavaScript and CSS.
These applications become popular when HTML5 came around and it was realized that native like functionality can now be obtained in the web browser . Now more websites being using HTML5 the distinction between regular web pages and web apps has narrowed.
While mobile developers can develop sophisticated apps with HTML5 and JavaScript still it suffers from limitations like secure offline storage , session management , and access to native device functionality like geolocation, camera, calendar etc. But there are still some native features that remain inaccessible in the web browser like notifications in background , accelerometer information and other complex gestures.
Although many native apps do not take advantage of these features also but to access devices information you need to create at least a hybrid app or native app .  


HTML5 Advantages
  • Developing this app is quick and saves a lot of time , it allows you to develop the code once and then deploy it on any platform . While the code may still need to be optimized for different mobile browsers but that task would be less compared to coding it again. Moreover HTML5 apps allow any user to use it without the need for downloading it again on their mobile phones . In addition to this the location of web app can make it easy to integrate with usage from PC.
HTML5 Disadvantages
  • HTML5 suffers from detriment that it does not provide its users a central location to purchase apps, it lacks the monetization power that Apple and Google app store offers . Being providing limited access to many mobile device features often developers rely on workarounds to get their required functionalities .

3.   Hybrid apps


Hybrid apps are partially native and partially web apps.  Just similar to native apps they reside in an app store and can even utilize the device features . It resembles to web apps in the sense that they rely on HTML being rendered in browser with the admonition that the browser is embedded in app.
Companies build hybrid applications as wrappers for existing web page , with this methodology they grab presence in app store without much effort being paid on developing a different app. Hybrid apps enables to embed HTML5 apps within thin native container, combining the best (and worst) elements of both HTML5 and native apps.
Hybrid apps currently gaining much of attention as they allow cross platform development and significantly reduces cost reuse on different mobile O.S .

What is cross-platform development?

Cross Platform Development runs on  “Write once and run everywhere” paradigm .
It is implemented by writing an application using a codebase and technology that allows it to be distributed and deployed across multiple disparate devices, operating systems and platforms. For example, a cross-platform application may run on the x86 architecture for Linux O.S , on x86 architecture for Windows, and Mac OS X on PowerPCor x86 based Apple Macintosh systems. Cross-platform applications can run on all existing platforms or as low as two platforms .
The biggest challenge of cross platform development is the software stack architecture and the hardware capabilities of the devices running on the two different platforms. HTML5 although makes cross-platform application development simpler while providing portability to the different platforms. If using Using PhoneGap, this includes Symbian, Samsung Bada, Apple iOS, Android, BlackBerry 4.6 and higher, HP WebOS and Windows Phone 7 – Mango platforms.
Adobe AIR then this includes BlackBerry Playbook & upcoming BBX platform, Apple’s iOS devices (iPhone and iPad), and Windows Metro (the tablet offering of Windows 8),

Advantages of Developing Cross Platform Mobile Apps -

The biggest benefit of cross platform development is that it targets multiple platforms and devices by writing minimal source code . The advantages that comes with this more devices and lesser code strategy is -

1.  Reduced Maintenance & Development Cost


These applications can originate from single codebase having single development skillset. It can target all platforms and does not require staff for individual ones. Having single codebase also reduces maintenance costs as it no longer requires to keep bug tracking for large number of codebase and relative staff for each platform.

2.  Lowers Technical  Barriers


App Development with JavaScript , Flex, HTML5 is easier than compared to Java and Objective -C . Due to this ease of development and language familiarity , the technical barrier is lowered in cross platform development boosting adoption of it.  It now allows more developers and teams to build applications that was previously not been easily able to . If developing native apps for different platforms then the development team requires to learn Java for Android applications, Objective C for iOS applications, Silverlight for Windows Phone applications, etc. It requires a developer to be adept in all technologies which is next to impossible . Cross development is a better technique to follow here as one needs to be proficient in one skillset or language. Now here more focus is laid on application being developed rather than on the skillset .

3.  Technical Strength


Certain technologies makes some tasks easier like data visualization and programmatic drawing is easy using ActionScript and  Flex . While developing equivalent experience in native code is more time consuming and even complex .
The aim of developing any mobile application is to drive more customers and wherein possible have high quality engagement with target market. So when majority are using same single platform the choice becomes easy but when you are targeting huge audience having multiple platforms then cross development is an ideal choice.
Moreover it is easier to maintain and deploy changes while developing single application for multi devices . Updates even easily get synchronized across all platforms.

4 .  Uniform Look and Feel


With Cross platform development the same design and overall feel can be maintained across all platforms as there single codebase running but while designing separately for platforms its pretty hard to synchronize between development teams of different expertise levels.

5.  Wider Reach and Effective  Marketing


When developing apps for multiple platforms you are benefited with the wider audience and exposure that your application can reach . It boosts the market potential of mobile app as more and more people of different platforms are now targeted. App running on iOS, Android and Windows Phones provides an extra added advantages if exposure is taken into account. Marketing do becomes easier as the apps can now be promoted on various media’s and platforms and not just catering a specific set of masses.


To discuss how we can help you in your projects, please contact with our team at info@oodlestechnologies.com,  skype:  oodles.tech or visit : http://www.oodlestechnologies.com

Monday, April 7, 2014

Is MongoDB a good choice for your app

In todays time when starting a new project you no longer have to choose only between RDBMS’s for software development despite a number of products are created (NoSQL) to offer new approaches to data persistence . Among them some offer better read-write performance than classical storage , some offer near-linear horizontal scalability and some focus on better data representation for more convenient data access for business domain.
MongoDb is one such NoSQL storage that supports sharding , replication and document-oriented persistence. MongoDB is the leading open source, document oriented and  cross-platform schemaless NoSQL Database developed by 10gen. It provides subscriptions , consulting , and training for the NoSQL database.
In MongoDB structured data is stored as JSON-like documents associated with dynamic schemas unlike it is stored in form of tables in classical relational database, thereby making data integrations much faster and easier. Unlike MySQL which is written using SQL queries, Mongodb is focused on BSON i.e binary JSON which means that much of the functionality can be directly accessed through JavaScript Notation.
MongoDb comes with its own shell interface to directly run commands onto database. It focus more on objects containing key value pairs.
NoSQL is a vague term that comprises of different types of database engines . It main classes include Graph databases , Column databases ,  Key/value stores and document databases.
Examples of Graph databases are Neo4j and OrientDB , these model depicts the relations between entities . Cassandra and Hadoop are Column databases and are used for processing large amounts of data . memcache  or Redis belong to key/value stores where data is stored and retrieved by a specific key . Lastly MongoDb and Apache CouchDb belongs to the last Document database category.
Document Database -
In a document database(MongoDb) the smallest unit is a document. Every record in MongoDB is a document composed of field and value pairs , it is a data structure more or less similar to JSON objects. The value here can include arrays or arrays of documents . Documents are stored inside a collection which together makes a database. There are many advantages of documents like it corresponds to native data types in many programming languages , dynamic schema supports polymorphism and embedded arrays and documents reduce any need for expensive joins.
Not every document is required to have the same structure each can have different fields or even sub documents normally described as nested or embedded documents .The document database allows to easily retrieve the objects without threading data together to form a valid object.

Why to choose MongoDB :-

1. MongoDB is free and Open source  -  It is open source and with new releases and updations it is still stable with nice documentations and a growing community. With each new updations new functionalities are being added at rapid pace.

2. Schemaless and Document Oriented  -  Mongo has no schema and hence makes it a perfect choice for rapid software development as you need not spend time doing schema design . Unlike relational database it stores data in collection of BSON documents which simplifies the mapping between database and domain objects . Arrays and nested objects are transparently stored in the DB making it an apt choice for domains with polymorphic data.

3 . Querying & Aggregation Framework  -  Mongodb provides a powerful querying facility , which uses indexes that you have created to query nested or embedded objects and arrays . For query that requires MAX , AVG or GROUP BY from SQL , mongo offers a new mechanism Aggregation framework , that allows to run ad-hoc aggregation queries without any need to write cumbersome scripts .
4.  Horizontal Scalability  -  Mongodb provides replication and sharding features to build a clustered topology where replication provides consistent read scaling while sharding facilitates read and write scaling.
5.  Intuitive architecture  -   Mongodb has a single master per replica making it simpler compared to other peer to peer architectures , it also offers fast writes for quick collection of various  statistics in a shorter response time .
6.  Multiple PL Support  -  A large no. of programming language can leverage mongodb from ruby to java to php.
7.  MapReduce  -  It is a powerful searching algorithm for aggregations and batch processing similar to hadoop. Massive aggregation is carried out by it, in mongo map and reduce functions are written in javascript and are executed on mongod servers and results are collected on result collections.
Mongodb even provides incremental MapReduce, it allows to run mapreduce jobs over collections,this can lessen the work by merging new data into existing results collection .

8.  Role Based - It allows to assign security policies to server and database and other cluster organization .
9 .   Mongodb offers replica sets for better fault tolerance and support for large amounts of data in larger environments. In these replica sets , all nodes are copies of one another and there is no single point of failure.
10.   Mongodb features a large community  with higher level ORM libraries that provides a closer mapping of objects .
MongoDb is mostly preferred and is best used while testing a new application to see how to structure a database with free form objects. Mongo is rich with drivers for nearly all languages including Perl, .NET , PHP, Python,C/C++,  and Node.js.
Key Features of MongoDB -
  • High Availability -  Replica sets which is mongodb’s replication facility is responsible for its high availability . It provides data redundancy and automatic failover.
  • High Performance -  Mongodb supports embedded data models that reduces input/output activity on DB. Mongodb being a document database has no joins and transactions making the queries much simpler ,also Indexes supports faster queries .
  • Automatic Scaling  -  Sharding provides scalability in mongodb . It distributes the large chunk of data into small clusters and allows horizontal scaling. Sharding can also be termed as partitioning . Mongodb can change partitions for data distribution and load balancing and allows to elastically add new nodes.
Apart from all the benefits Mongodb offers it comes with few flaws that should also be considered while adopting it for your business .
Since mongo is a NoSQL technology so if there is need to select related data from different collections then it has to be done manually which offers slight inconsistency. Moreover ACID transactions won’t be there anymore hence no automatic rollbacks , but this can be overcomed with two-phase commit , in-app locks and entity versions . MongoDB like many RDBMS’s is not optimized to work on HDD , it performs well when your indexes fit into RAM and your SSD hard drives on prod servers.
For setting up MongoDB with Authentication on Windows follow this Blog and for authentication on Ubuntu follow  this  .

To discuss how we can help you, please contact with our team at info@oodlestechnologies.com   or skype : oodles.tech .

Tuesday, March 25, 2014

Is MongoDB a good choice for your Application


In todays time when starting a new project you no longer have to choose only between RDBMS’s for software development despite a number of products are created (NoSQL) to offer new approaches to data persistence . Among them some offer better read-write performance than classical storage , some offer near-linear horizontal scalability and some focus on better data representation for more convenient data access for business domain.
MongoDb is one such NoSQL storage that supports sharding , replication and document-oriented persistence. MongoDB is the leading open source, document oriented and  cross-platform schemaless NoSQL Database developed by 10gen. It provides subscriptions , consulting , and training for the NoSQL database.
In MongoDB structured data is stored as JSON-like documents associated with dynamic schemas unlike it is stored in form of tables in classical relational database, thereby making data integrations much faster and easier. Unlike MySQL which is written using SQL queries, Mongodb is focused on BSON i.e binary JSON which means that much of the functionality can be directly accessed through JavaScript Notation.
MongoDb comes with its own shell interface to directly run commands onto database. It focus more on objects containing key value pairs.
NoSQL is a vague term that comprises of different types of database engines . It main classes include Graph databases , Column databases ,  Key/value stores and document databases.
Examples of Graph databases are Neo4j and OrientDB , these model depicts the relations between entities . Cassandra and Hadoop are Column databases and are used for processing large amounts of data . memcache  or Redis belong to key/value stores where data is stored and retrieved by a specific key . Lastly MongoDb and Apache CouchDb belongs to the last Document database category.
Document Database -
In a document database(MongoDb) the smallest unit is a document. Every record in MongoDB is a document composed of field and value pairs , it is a data structure more or less similar to JSON objects. The value here can include arrays or arrays of documents . Documents are stored inside a collection which together makes a database. There are many advantages of documents like it corresponds to native data types in many programming languages , dynamic schema supports polymorphism and embedded arrays and documents reduce any need for expensive joins.
Not every document is required to have the same structure each can have different fields or even sub documents normally described as nested or embedded documents .The document database allows to easily retrieve the objects without threading data together to form a valid object.


Why to choose MongoDB :-


1. MongoDB is free and Open source  -  It is open source and with new releases and updations it is still stable with nice documentations and a growing community. With each new updations new functionalities are being added at rapid pace.


2. Schemaless and Document Oriented  -  Mongo has no schema and hence makes it a perfect choice for rapid software development as you need not spend time doing schema design . Unlike relational database it stores data in collection of BSON documents which simplifies the mapping between database and domain objects . Arrays and nested objects are transparently stored in the DB making it an apt choice for domains with polymorphic data.

3 . Querying & Aggregation Framework  -  Mongodb provides a powerful querying facility , which uses indexes that you have created to query nested or embedded objects and arrays . For query that requires MAX , AVG or GROUP BY from SQL , mongo offers a new mechanism Aggregation framework , that allows to run ad-hoc aggregation queries without any need to write cumbersome scripts .
4.  Horizontal Scalability  -  Mongodb provides replication and sharding features to build a clustered topology where replication provides consistent read scaling while sharding facilitates read and write scaling.
5.  Intuitive architecture  -   Mongodb has a single master per replica making it simpler compared to other peer to peer architectures , it also offers fast writes for quick collection of various  statistics in a shorter response time .
6.  Multiple PL Support  -  A large no. of programming language can leverage mongodb from ruby to java to php.
7.  MapReduce  -  It is a powerful searching algorithm for aggregations and batch processing similar to hadoop. Massive aggregation is carried out by it, in mongo map and reduce functions are written in javascript and are executed on mongod servers and results are collected on result collections.
Mongodb even provides incremental MapReduce, it allows to run mapreduce jobs over collections,this can lessen the work by merging new data into existing results collection .

8.  Role Based - It allows to assign security policies to server and database and other cluster organization .
9 .   Mongodb offers replica sets for better fault tolerance and support for large amounts of data in larger environments. In these replica sets , all nodes are copies of one another and there is no single point of failure.
10.   Mongodb features a large community  with higher level ORM libraries that provides a closer mapping of objects .
MongoDb is mostly preferred and is best used while testing a new application to see how to structure a database with free form objects. Mongo is rich with drivers for nearly all languages including Perl, .NET , PHP, Python,C/C++,  and Node.js.
Key Features of MongoDB -
  • High Availability -  Replica sets which is mongodb’s replication facility is responsible for its high availability . It provides data redundancy and automatic failover.
  • High Performance -  Mongodb supports embedded data models that reduces input/output activity on DB. Mongodb being a document database has no joins and transactions making the queries much simpler ,also Indexes supports faster queries .
  • Automatic Scaling  -  Sharding provides scalability in mongodb . It distributes the large chunk of data into small clusters and allows horizontal scaling. Sharding can also be termed as partitioning . Mongodb can change partitions for data distribution and load balancing and allows to elastically add new nodes.
Apart from all the benefits Mongodb offers it comes with few flaws that should also be considered while adopting it for your business .
Since mongo is a NoSQL technology so if there is need to select related data from different collections then it has to be done manually which offers slight inconsistency. Moreover ACID transactions won’t be there anymore hence no automatic rollbacks , but this can be overcomed with two-phase commit , in-app locks and entity versions . MongoDB like many RDBMS’s is not optimized to work on HDD , it performs well when your indexes fit into RAM and your SSD hard drives on prod servers.
For setting up MongoDB with Authentication on Windows follow this Blog and for authentication on Ubuntu follow  this  .

To discuss how we can help you, please contact with our team at info@oodlestechnologies.com   or skype : oodles.tech .