Browse By Tags

  • What is U-SQL?

    Microsoft has taken another step towards making analysis of big data easier with the introduction of their U-SQL language, the new query language designed to run on the Azure Data Lake Store. Announced in September of this year, the Data Lake Store is…

  • 5 trends in Big Data

    “Big data” is the common term for the exponential growth and availability of data, both structured and unstructured. Referring to it as ‘big’ data is perhaps somewhat of an understatement – IBM estimated that 2.5 exabytes…

  • Too Big Data: Coping with Overplotting

    Scatter plots are a wonderful way of showing (apparent) relationships in bivariate data. Patterns and clusters that you wouldn't see in a huge block of data in a table can become instantly visible on a page or screen. With all the hype around Big Data…

  • Visual Explorations of Sample Size

     

    Drawing conclusion based on small samples is obviously problematic. At the same time, I also wonder whether the rise to prominence of "Big Data" can lead organisations to blindly collect as much data as possible rather than think logically about how…

  • Aspects of Datasets - Part 2

    This is the second (and final) article looking at key aspects of datasets. Having previously covered relevance, accuracy, and precision, here we will consider consistency, completeness and size.

    Consistency

    On the 23rd of September 1999, NASA's Mars Climate…

  • What is A PetaByte?

    According to

    • Wikipedia
      • “A petabyte (derived from the SI prefix peta- ) is a unit of information equal to one quadrillion (short scale) bytes, or 1 billiard (long scale) bytes. The unit symbol for the petabyte is PB. The prefix peta (P) indicates the…