RAM is the new disk – and how to measure its performance – Part 3 – CPU Instructions & Cycles

If you haven’t read the previous parts of this series yet, here are the links: [ Part 1 | Part 2 ].

A Refresher

In the first part of this series I said that RAM access is the slow component of a modern in-memory database engine and for performance you’d want to reduce RAM access as much as possible. Reduced memory traffic thanks to the new columnar data formats is the most important enabler for the awesome In-Memory processing performance and SIMD is just icing on the cake.

In the second part I also showed how to measure the CPU efficiency of your (Oracle) process using a Linux perf stat command. How well your applications actually utilize your CPU execution units depends on many factors. The biggest factor is your process’es cache efficiency that depends on the CPU cache size and your application’s memory access patterns. Regardless of what the OS CPU accounting tools like top or vmstat may show you, your “100% busy” CPUs may actually spend a significant amount of their cycles internally idle, with a stalled pipeline, waiting for some event (like a memory line arrival from RAM) to happen.

Luckily there are plenty of tools for measuring what’s actually going on inside the CPUs, thanks to modern processors having CPU Performance Counters (CPC) built in to them.

A key derived metric for understanding CPU-efficiency is the IPC (instructions per cycle). Years ago people were actually talking about the inverse metric CPI (cycles per instruction) as on average it took more than one CPU cycle to complete an instruction’s execution (again, due to the abovementioned reasons like memory stalls). However, thanks to today’s superscalar processors with out-of-order execution on a modern CPU’s multiple execution units – and with large CPU caches – a well-optimized application can execute multiple instructions per a single CPU cycle, thus it’s more natural to use the IPC (instructions-per-cycle) metric. With IPC, higher is better.

Continue reading

Posted in InMemory, Linux, Oracle | Leave a comment

Troubleshooting Another Complex Performance Issue – Oracle direct path inserts and SEG$ contention

Here’s an updated presentation I first delivered at Hotsos Symposium 2015.

It’s about lots of concurrent PX direct path insert ant CTAS statements that, when clashing with another bug/problem, caused various gc buffer busy waits and enq: TX – allocate ITL entry contention. This got amplified thanks to running this concurrent workload on 4 RAC nodes:

When reviewing these slides, I see there’s quite a lot that needs to be said in addition to what’s on slides, so this might just mean a (Powerpoint) hacking session some day!

Posted in Oracle | Leave a comment

My Oracle OpenWorld presentations

Oracle OpenWorld is just around the corner – I will have one presentation at OOW this year and another at the independent OTW event:

Connecting Oracle with Hadoop

Real-Time SQL Monitoring in Oracle Database 12c

  • Conference: OpenWorld
  • Time: Wednesday, 28 Oct, 3:00pm
  • Location: Moscone South 103
  • Abstract: Click here (sign up to guarantee a seat!)

I plan to hang out at the OTW venue on Monday and Tuesday, so see you there!


Posted in Announcement, Hadoop, Oracle, Oracle 12c | Leave a comment

Advanced Oracle Troubleshooting v2.5 (with 12c stuff too)

It took a while (1.5 years since my last class – I’ve been busy!), but I am ready with my Advanced Oracle Troubleshooting training (version 2.5) that has plenty of updates, including some more modern DB kernel tracing & ASH stuff and of course Oracle 12c topics!

The online training will take place on 16-20 November & 14-18 December 2015 (Part 1 and Part 2).

The latest TOC is below:

Seminar registration details:

A notable improvement of AOT v2.5: now attendees will get downloadable video recordings after the sessions for personal use! So, no crappy streaming with 14-day expiry date, you can download the video MP4 files straight to your computer or tablet and keep for your use forever!

I won’t be doing any other classes this year, but there will be some more (pleasant) surprises coming next year ;-)

See you soon!

Posted in Announcement, Oracle, Oracle 12c, Productivity | 4 Comments

RAM is the new disk – and how to measure its performance – Part 2 – Tools

[ part 1 | part 2 | part 3 ]

In the previous article I explained that the main requirement for high-speed in-memory data scanning is column-oriented storage format for in-memory data. SIMD instruction processing is just icing on the cake. Let’s dig deeper. This is a long post, you’ve been warned.

Test Environment

I will cover full test results in the next article in this series. First, let’s look into the test setup, environment and what tools I used for peeking inside CPU hardware.

I was running the tests on a relatively old machine with 2 CPU sockets, with 6-core CPUs in each socket (2s12c24t):

$ egrep "MHz|^model name" /proc/cpuinfo | sort | uniq -c
     24 cpu MHz		: 2926.171
     24 model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz

The CPUs support SSE4.2 SIMD extensions (but not the newer AVX stuff):

$ grep ^flags /proc/cpuinfo | egrep "avx|sse|popcnt" | sed 's/ /\n/g' | egrep "avx|sse|popcnt" | sort | uniq

Even though the /proc/cpuinfo above shows the CPU clock frequency as 2.93GHz, these CPUs have Intel Turboboost feature that allows some cores run at up to 3.33GHz frequency when not all cores are fully busy and the CPUs aren’t too hot.

Indeed, the turbostat command below shows that the CPU core executing my Oracle process was running at 3.19GHz frequency:

# turbostat -p sleep 1
pk cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6 CTMP   %pc3   %pc6
             6.43 3.02 2.93   0  93.57   0.00   0.00   59   0.00   0.00
 0   0   0   4.49 3.19 2.93   0  95.51   0.00   0.00   46   0.00   0.00
 0   1   1  10.05 3.19 2.93   0  89.95   0.00   0.00   50
 0   2   2   2.48 3.19 2.93   0  97.52   0.00   0.00   45
 0   8   3   2.05 3.19 2.93   0  97.95   0.00   0.00   44
 0   9   4   0.50 3.20 2.93   0  99.50   0.00   0.00   50
 0  10   5 100.00 3.19 2.93   0   0.00   0.00   0.00   59
 1   0   6   6.25 2.23 2.93   0  93.75   0.00   0.00   44   0.00   0.00
 1   1   7   3.93 2.04 2.93   0  96.07   0.00   0.00   43
 1   2   8   0.82 2.15 2.93   0  99.18   0.00   0.00   44
 1   8   9   0.41 2.48 2.93   0  99.59   0.00   0.00   41
 1   9  10   0.99 2.35 2.93   0  99.01   0.00   0.00   43
 1  10  11   0.76 2.36 2.93   0  99.24   0.00   0.00   44

I will come back to this CPU frequency turbo-boosting later when explaining some performance metrics.

I ran the experiments in Oct/Nov 2014, so used a relatively early Oracle version with a bundle patch (19189240) for in-memory stuff.

The test was deliberately very simple as I was researching raw in-memory scanning and filtering speed and was not looking into join/aggregation performance. I was running the query below with different hints and parameters to change access path options:

SELECT COUNT(cust_valid) FROM customers_nopart c WHERE cust_id > 0

Continue reading

Posted in InMemory, Oracle, Oracle 12c | 4 Comments

We are hiring!

Gluent – where I’m a cofounder & CEO – is hiring awesome developers and (big data) infrastructure specialists in US and UK!

We are still in stealth mode, so won’t be detailing publicly what exactly we are doing ;-)

However, it is evident that the modern data platforms (for example Hadoop) with their scalability, affordability-at-scale and freedom to use many different processing engines on open data formats are turning enterprise IT upside down.

This shift has already been going on for years in large internet & e-commerce companies and small startups, but now the shockwave is arriving to all traditional enterprises too. And every single one of them must accept it, in order to stay afloat and win in the new world.

Do you want to be part of the new world?



Posted in Announcement, Big Data, Hadoop | Leave a comment

RAM is the new disk – and how to measure its performance – Part 1 – Introduction

[ part 1 | part 2 | part 3 ]

RAM is the new disk, at least in the In-Memory computing world.

No, I am not talking about Flash here, but Random Access Memory – RAM as in SDRAM. I’m by far not the first one to say it. Jim Gray wrote this in 2006: “Tape is dead, disk is tape, flash is disk, RAM locality is king” (presentation)

Also, I’m not going to talk about how RAM is faster than disk (everybody knows that), but in fact how RAM is the slow component of an in-memory processing engine.

I will use Oracle’s In-Memory column store and the hardware performance counters in modern CPUs for drilling down into the low-level hardware performance metrics about CPU efficiency and memory access.

But let’s first get started by looking a few years into past into the old-school disk IO and index based SQL performance bottlenecks :)

Continue reading

Posted in InMemory, Oracle, Oracle 12c | Leave a comment