Gluent New World #03: Real Time Stream Processing in Modern Enterprises with Gwen Shapira

It’s time to announce the 3rd episode of Gluent New World webinar series! This time Gwen Shapira will talk about Kafka as a key data infrastructure component of a modern enterprise. And I will ask questions from an old database guy’s viewpoint :)

Apache Kafka and Real Time Stream Processing

Speaker:

  • Gwen Shapira (Confluent)
  • Gwen is a system architect at Confluent helping customers achieve
    success with their Apache Kafka implementation. She has 15 years of
    experience working with code and customers to build scalable data
    architectures, integrating relational and big data technologies. She
    currently specializes in building real-time reliable data processing
    pipelines using Apache Kafka. Gwen is an Oracle Ace Director, an
    author of “Hadoop Application Architectures”, and a frequent presenter
    at industry conferences. Gwen is also a committer on the Apache Kafka
    and Apache Sqoop projects. When Gwen isn’t coding or building data
    pipelines, you can find her pedaling on her bike exploring the roads
    and trails of California, and beyond.

Time:

Abstract:

  • Modern businesses have data at their core, and this data is
    changing continuously. How can we harness this torrent of continuously
    changing data in real time? The answer is stream processing, and one
    system that has become a core hub for streaming data is Apache Kafka.This presentation will give a brief introduction to Apache Kafka and
    describe it’s usage as a platform for streaming data. It will explain
    how Kafka serves as a foundation for both streaming data pipelines and
    applications that consume and process real-time data streams. It will
    introduce some of the newer components of Kafka that help make this
    possible, including Kafka Connect, framework for capturing continuous
    data streams, and Kafka Streams, a lightweight stream processing
    library. Finally it will describe some of our favorite use-cases of
    stream processing and how they solved interesting data scalability
    challenges.

Register:

See you soon!

Posted in Announcement, Big Data, Hadoop | Leave a comment

Gluent New World #02: SQL-on-Hadoop with Mark Rittman

Update: The video recording of this session is here:

Slides are here.

Other videos are available at Gluent video collection.


It’s time to announce the 2nd episode of the Gluent New World webinar series!

The Gluent New World webinar series is about modern data management: architectural trends in enterprise IT and technical fundamentals behind them.

GNW02: SQL-on-Hadoop : A bit of History, Current State-of-the-Art, and Looking towards the Future

Speaker:

  • This GNW episode is presented by no other than Mark Rittman, the co-founder & CTO of Rittman Mead and an all-around guru of enterprise BI!

Time:

  • Tue, Apr 19, 2016 12:00 PM – 1:15 PM CDT

Abstract:

Hadoop and NoSQL platforms initially focused on Java developers and slow but massively-scalable MapReduce jobs as an alternative to high-end but limited-scale analytics RDBMS engines. Apache Hive opened-up Hadoop to non-programmers by adding a SQL query engine and relational-style metadata layered over raw HDFS storage, and since then open-source initiatives such as Hive Stinger, Cloudera Impala and Apache Drill along with proprietary solutions from closed-source vendors have extended SQL-on-Hadoop’s capabilities into areas such as low-latency ad-hoc queries, ACID-compliant transactions and schema-less data discovery – at massive scale and with compelling economics.

In this session we’ll focus on technical foundations around SQL-on-Hadoop, first reviewing the basic platform Apache Hive provides and then looking in more detail at how ad-hoc querying, ACID-compliant transactions and data discovery engines work along with more specialised underlying storage that each now work best with – and we’ll take a look to the future to see how SQL querying, data integration and analytics are likely to come together in the next five years to make Hadoop the default platform running mixed old-world/new-world analytics workloads.

Register:

 

If you missed the last GNW01: In-Memory Processing for Databases session, here are the video recordings and slides!

See you soon!

 

 

Posted in Announcement, Big Data, Hadoop, Oracle | Leave a comment

Gluent Demo Video Launch

Although we are still in stealth mode (kind-of), due to the overwhelming requests for information, we decided to publish a video about what we do :)

It’s a short 5-minute video, just click on the image below or go straight to http://gluent.com:

Gluent Demo video

And this, by the way, is just the beginning.

Gluent is getting close to 20 people now, distributed teams in US and UK – and we are still hiring!

 

 

 

Posted in Announcement, Big Data, Cool stuff, Hadoop, Oracle | 2 Comments

GNW01: In-Memory Processing for Databases

Hi, it took a bit longer than I had planned, but here’s the first Gluent New World webinar recording!

You can also subscribe to our new Vimeo channel here – I will announce the next event with another great speaker soon ;-)

A few comments:

  • Slides are here
  • I’ll figure a good way to deal with offline follow-up Q&A later on, after we’ve done a few of these events

If you like this stuff, please share it too – let’s make this series totally awesome!

 

 

Posted in Big Data, Hadoop, InMemory, Oracle | Leave a comment

Gluent New World: In-Memory Processing for Databases

As Gluent is all about gluing together the old world and new world in enterprises, it’s time to announce the Gluent New World webinar series!

The Gluent New World sessions cover the important technical details behind new advancements in enterprise technologies that are arriving into mainstream use.

These seminars help you to stay current with the major technology changes that are inevitably arriving into your company soon (if not already). You can make informed decisions about what to learn next – to still be relevant in your profession also 5 years from now.

Think about software-defined storage, open data formats, cloud processing, in-memory computation, direct attached storage, all-flash and distributed stream processing – and this is just a start!

The speakers of this series are technical experts in their field – able to explain in detail how the technology works internally, which fundamental changes in the technology world have enabled these advancements and why it matters to all of you (not just the Googles and Facebooks out there).

I picked myself as the speaker for the first event in this series:

Gluent New World: In-Memory Processing for Databases

In this session, Tanel Poder will explain how the new CPU cache-efficient data processing methods help to radically speed up data processing workloads – on data stored in RAM and also read from disk! This is a technical session about internal CPU efficiency and cache-friendly data structures, using Oracle Database and Apache Arrow as examples.

Time:

  • Tue, Mar 22, 2016 1:00 PM – 2:00 PM CDT

Register:

After registering, you will receive a confirmation email containing information about joining the webinar.

See you soon!

Posted in Announcement, Big Data, InMemory, Oracle, Oracle 12c | 4 Comments

Public Appearances H1 2016

Here’s where I’ll hang out in the following months:

26-28 January 2016: BIWA Summit 2016 in Redwood Shores, CA

10-11 February 2016: RMOUG Training Days in Denver, CO

25 February 2016: Yorkshire Database (YoDB) in Leeds, UK

6-10 March 2016: Hotsos Symposium, Dallas, TX

10-14 April 2016: IOUG Collaborate, Las Vegas, NV

  • Beer session: Not speaking myself but planning to hang out on a first couple of conference days, drink beer and attend Gluent colleague Maxym Kharchenko‘s presentations

24-26 April 2016: Enkitec E4, Barcelona, Spain

18-19 May 2016: Great Lakes Oracle Conference (GLOC) in Cleveland, OH

  • I plan to submit abstracts (and hope to get some accepted :)
  • The abstract submission is still open until 1st February 2016

2-3 June 2016: AMIS 25 – Beyond the Horizon near Leiden, Netherlands

  • This AMIS 25th anniversary event will take place in a pretty cool location – an old military airport hangar (and abstract submission is still open :)
  • I plan to deliver 2 presentations, one about the usual Oracle performance stuff I do and one about Hadoop

5-7 June 2016: Enkitec E4, Dallas, TX

 

As you can see, I have changed my “I don’t want to travel anymore” policy ;-)

 

Posted in Announcement | 2 Comments

Gluent launch! New production release, new HQ, new website!

I’m happy to announce that the last couple of years of hard work is paying off and the Gluent Offload Engine is production now! After beta testing with our early customers, we are now out of complete stealth mode and are ready talk more about what exactly are we doing :-)

Check out our new website and product & use case info here!

Follow us on Twitter:

We are hiring! Need to fill that new Dallas World HQ ;-) Our distributed teams around the US and in London need more helping hands (and brains!) too.

You’ll be hearing more of us soon :-)

Paul & Tanel just moved in to Gluent World HQ
Paul & Tanel just moved in to Gluent World HQ
Posted in Announcement, Big Data, Cool stuff, Hadoop, Oracle | 8 Comments

RAM is the new disk – and how to measure its performance – Part 3 – CPU Instructions & Cycles

If you haven’t read the previous parts of this series yet, here are the links: [ Part 1 | Part 2 ].

A Refresher

In the first part of this series I said that RAM access is the slow component of a modern in-memory database engine and for performance you’d want to reduce RAM access as much as possible. Reduced memory traffic thanks to the new columnar data formats is the most important enabler for the awesome In-Memory processing performance and SIMD is just icing on the cake.

In the second part I also showed how to measure the CPU efficiency of your (Oracle) process using a Linux perf stat command. How well your applications actually utilize your CPU execution units depends on many factors. The biggest factor is your process’es cache efficiency that depends on the CPU cache size and your application’s memory access patterns. Regardless of what the OS CPU accounting tools like top or vmstat may show you, your “100% busy” CPUs may actually spend a significant amount of their cycles internally idle, with a stalled pipeline, waiting for some event (like a memory line arrival from RAM) to happen.

Luckily there are plenty of tools for measuring what’s actually going on inside the CPUs, thanks to modern processors having CPU Performance Counters (CPC) built in to them.

A key derived metric for understanding CPU-efficiency is the IPC (instructions per cycle). Years ago people were actually talking about the inverse metric CPI (cycles per instruction) as on average it took more than one CPU cycle to complete an instruction’s execution (again, due to the abovementioned reasons like memory stalls). However, thanks to today’s superscalar processors with out-of-order execution on a modern CPU’s multiple execution units – and with large CPU caches – a well-optimized application can execute multiple instructions per a single CPU cycle, thus it’s more natural to use the IPC (instructions-per-cycle) metric. With IPC, higher is better.

Continue reading

Posted in InMemory, Linux, Oracle | 6 Comments