- Viscosity: Viscosity is a measure of the resistance encountered in the mass data in the flow. The resistance is from the friction from integration flow rates, different sources of data origination, and transformation required to convert the data into information. Efficient messaging systems like Kafka provides stronger ordering guarantees in the persistent high-throughput message queue. Streaming technologies like Storm can enable distributed continuous processing of incoming data in real time. Sophisticated CEP engines further strengthen the rule-based event-driven processability of Big Data in support of things like PMML.
- Virality: Virality is the ability of data to be distributed over networks, measuring the speed of dispersion across peer-to-peer networks. Time and number of crosslinks are vital factors that determine the spreading rate. CDN is a type of large distributed systems to server contents to end users in high performance and availability. P2P-assisted streaming technologies are leveraged in online video by vendors like Netflix.
- Vigilance: Project teams need to be watchful for the traps and pitfalls in the Big Data implementations. Not just a handful of organizations have deployed Hadoop extensively in an attempt to process data in real time, not realizing that Hadoop was constructed for batch processing by design. Users must be careful to deal with the data in motion and data at rest. For example, one can leverage Lambda architecture to make full use of both batch- and stream-processing methods for massive quantity of data. Hybrid use of SQL and NoSQL is also advantageous, but be alert of difficulties in operation.
For more information, please contact Tony Shan (blog@tonyshan.com) or leave your comments below.
©Tony Shan. All rights reserved. All standard disclaimers apply here.

No comments:
Post a Comment