RAM is the new disk – and how to measure its performance – Part 2 – Tools

In the previous article I explained that the main requirement for high-speed in-memory data scanning is column-oriented storage format for in-memory data. SIMD instruction processing is just icing on the cake. Let’s dig deeper. This is a long post, you’ve been warned.

Test Environment

I will cover full test results in the next article in this series. First, let’s look into the test setup, environment and what tools I used for peeking inside CPU hardware.

I was running the tests on a relatively old machine with 2 CPU sockets, with 6-core CPUs in each socket (2s12c24t):

$ egrep "MHz|^model name" /proc/cpuinfo | sort | uniq -c
     24 cpu MHz		: 2926.171
     24 model name	: Intel(R) Xeon(R) CPU           X5670  @ 2.93GHz

The CPUs support SSE4.2 SIMD extensions (but not the newer AVX stuff):

$ grep ^flags /proc/cpuinfo | egrep "avx|sse|popcnt" | sed 's/ /\n/g' | egrep "avx|sse|popcnt" | sort | uniq

Even though the /proc/cpuinfo above shows the CPU clock frequency as 2.93GHz, these CPUs have Intel Turboboost feature that allows some cores run at up to 3.33GHz frequency when not all cores are fully busy and the CPUs aren’t too hot.

Indeed, the turbostat command below shows that the CPU core executing my Oracle process was running at 3.19GHz frequency:

# turbostat -p sleep 1
pk cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6 CTMP   %pc3   %pc6
             6.43 3.02 2.93   0  93.57   0.00   0.00   59   0.00   0.00
 0   0   0   4.49 3.19 2.93   0  95.51   0.00   0.00   46   0.00   0.00
 0   1   1  10.05 3.19 2.93   0  89.95   0.00   0.00   50
 0   2   2   2.48 3.19 2.93   0  97.52   0.00   0.00   45
 0   8   3   2.05 3.19 2.93   0  97.95   0.00   0.00   44
 0   9   4   0.50 3.20 2.93   0  99.50   0.00   0.00   50
 0  10   5 100.00 3.19 2.93   0   0.00   0.00   0.00   59
 1   0   6   6.25 2.23 2.93   0  93.75   0.00   0.00   44   0.00   0.00
 1   1   7   3.93 2.04 2.93   0  96.07   0.00   0.00   43
 1   2   8   0.82 2.15 2.93   0  99.18   0.00   0.00   44
 1   8   9   0.41 2.48 2.93   0  99.59   0.00   0.00   41
 1   9  10   0.99 2.35 2.93   0  99.01   0.00   0.00   43
 1  10  11   0.76 2.36 2.93   0  99.24   0.00   0.00   44

I will come back to this CPU frequency turbo-boosting later when explaining some performance metrics.

I ran the experiments in Oct/Nov 2014, so used a relatively early Oracle version with a bundle patch (19189240) for in-memory stuff.

The test was deliberately very simple as I was researching raw in-memory scanning and filtering speed and was not looking into join/aggregation performance. I was running the query below with different hints and parameters to change access path options:

SELECT COUNT(cust_valid) FROM customers_nopart c WHERE cust_id > 0

Continue reading

Posted in InMemory, Oracle, Oracle 12c | 2 Comments

We are hiring!

Gluent – where I’m a cofounder & CEO – is hiring awesome developers and (big data) infrastructure specialists in US and UK!

We are still in stealth mode, so won’t be detailing publicly what exactly we are doing ;-)

However, it is evident that the modern data platforms (for example Hadoop) with their scalability, affordability-at-scale and freedom to use many different processing engines on open data formats are turning enterprise IT upside down.

This shift has already been going on for years in large internet & e-commerce companies and small startups, but now the shockwave is arriving to all traditional enterprises too. And every single one of them must accept it, in order to stay afloat and win in the new world.

Do you want to be part of the new world?



Posted in Announcement, Big Data, Hadoop | Leave a comment

RAM is the new disk – and how to measure its performance – Part 1 – Introduction

RAM is the new disk, at least in the In-Memory computing world.

No, I am not talking about Flash here, but Random Access Memory – RAM as in SDRAM. I’m by far not the first one to say it. Jim Gray wrote this in 2006: “Tape is dead, disk is tape, flash is disk, RAM locality is king” (presentation)

Also, I’m not going to talk about how RAM is faster than disk (everybody knows that), but in fact how RAM is the slow component of an in-memory processing engine.

I will use Oracle’s In-Memory column store and the hardware performance counters in modern CPUs for drilling down into the low-level hardware performance metrics about CPU efficiency and memory access.

But let’s first get started by looking a few years into past into the old-school disk IO and index based SQL performance bottlenecks :)

Continue reading

Posted in InMemory, Oracle, Oracle 12c | Leave a comment

The Hybrid World is Coming

Here’s the video of E4 keynote we delivered together with Kerry Osborne a few weeks ago.

It explains what we see is coming, at a high level, from long time Oracle database professionals’ viewpoint and using database terminology (as the E4 audience is all Oracle users like us).

However, this change is not really about Oracle database world, it’s about a much wider shift in enterprise computing: modern Hadoop data lakes and clouds are here to stay. They are already taking over many workloads traditionally executed on in-house RDBMS systems on SAN storage arrays – especially all kinds of reporting and analytics. Oracle is just one of the many vendors affected by all this and they’ve also jumped onto the Hadoop bandwagon.

However, it would be naive to to think that Hadoop would somehow replace all your transactional or ERP systems or existing application code with thousands of complex SQL reports. Many of the traditional systems aren’t going away any time soon.

But the hybrid world is coming. It’s been a very good idea for Oracle DBAs to additionally learn Linux over the last 5-10 years, now is pretty much the right time to start learning Hadoop too. More about this in a future article ;-)

Check out the keynote video here:

Enjoy :-)

Update 1: There’s a recently published good WSJ article which pretty much echoes how we see things happening in the hybrid world.

Update 2: On a more technical note, here’s an article about a Barclays use case in Cloudera blog that illustrates what I meant by pushing your complex (and custom) logic down into Hadoop stack with Spark as opposed to using some separate application server system that pulls in datasets for further processing via SQL.

Posted in Announcement, Big Data, Hadoop, Oracle | 3 Comments

Old ventures and new adventures

I have some news, two items actually.

First, today (it’s still 18th June in California) is my blog’s 8th anniversary!

I wrote my first blog post, about Advanced Oracle Troubleshooting, exactly 8 years ago, on 18th June 2007 and have written 229 blog posts since. I had started writing and accumulating my TPT script collection a couple of years earlier and now it has over 1000 files in it! And no, I don’t remember what all of them do and even why I had written them. Also I haven’t yet created an index/documentation for all of them (maybe on the 10th anniversary? ;)

Thanks everyone for your support, reading, commenting and the ideas we’ve exchanged over all these years, it’s been awesome to learn something new every single day!

You may have noticed that I haven’t been too active in online forums nor blogging much in the last couple of years, which brings me to the second news item(s):

I’ve been heavily focusing on Hadoop. It is the future. It will win, for the same reasons Linux won. I moved to US over a year ago and am currently in San Francisco. The big data hype is the biggest here. Except it’s not hype anymore; and Hadoop is getting enterprise-ready.

I am working on a new startup. I am the CEO who still occasionally troubleshoots stuff (must learn something new every day!). We officially incorporated some months ago, but our first developers in Dallas and London have been busy in the background for over a year. By now we are beta testing with our most progressive customers ;-) We are going to be close partners with old and new friends in modern data management space and especially the awesome folks in Accenture Enkitec Group.

The name is Gluent. We glue together the old and new worlds in enterprise IT. Relational databases vs. Hadoop. Legacy ETL vs. Spark. SAN storage vs. the cloud. Jungles of data feeds vs. a data lake. I’m not going to tell you any more as we are still in stealth mode ;-)

Now, where does this leave Oracle technology? Well, I think it still kicks ass and it ain’t going away! In fact we are betting on it. Hadoop is here to stay, but your existing systems aren’t going away any time soon.

I wouldn’t want to run my critical ERP or complex transactional systems on anything other than Oracle. Want real time in-memory reporting on your existing Oracle OLTP system – with immediate consistency, not a multi-second lag: Oracle. Oracle is the king of complex OLTP and I don’t see it changing soon.

So, thanks for reading all the way to the end – and expect to hear much more about Gluent in the future! You can follow @GluentInc Twitter handle to be the first to hear any further news :-)


Posted in Announcement, Big Data, Cool stuff, Hadoop, Oracle | 17 Comments

Advanced Oracle Troubleshooting Guide – Part 12: control file reads causing enq: SQ – contention waits?

Vishal Desai systematically troubleshooted an interesting case where the initial symptoms of the problem showed a spike of enq: SQ – contention waits, but he dug deeper – and found the root cause to be quite different. He followed the blockers of waiting sessions manually to reach the root cause – and also used my @ash/ash_wait_chains.sql and @ash/event_hist.sql scripts to extract the same information more conveniently (note that he had modified the scripts to take AWR snap_ids as time range parameters instead of the usual date/timestamp):

Definitely worth a read if you’re into troubleshooting non-trivial performance problems :)

Posted in Oracle | 8 Comments

Sqlplus is my second home, part 8: Embedding multiple sqlplus arguments into one variable

I’ve updated some of my ASH scripts to use these 4 arguments in a standard way:

  1. What ASH columns to display (and aggregate by)
  2. Which ASH rows to use for the report (filter)
  3. Time range start
  4. Time range end

So this means whenever I run ashtop (or dashtop) for example, I need to type in all 4 parameters. The example below would show top SQL_IDs only for user SOE sessions from last hour of ASH samples:

SQL> @ashtop sql_id username='SOE' sysdate-1/24 sysdate

  Seconds     AAS %This   SQL_ID        FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------- ------------------- ------------------- -----------------
     2271      .6   21% | 56pwkjspvmg3h 2015-03-29 13:13:16 2015-03-29 13:43:34               145
     2045      .6   19% | gkxxkghxubh1a 2015-03-29 13:13:16 2015-03-29 13:43:14               149
     1224      .3   11% | 29qp10usqkqh0 2015-03-29 13:13:25 2015-03-29 13:43:32               132
      959      .3    9% | c13sma6rkr27c 2015-03-29 13:13:19 2015-03-29 13:43:34               958
      758      .2    7% |               2015-03-29 13:13:16 2015-03-29 13:43:31                 1

When I want more control and specify a fixed time range, I can just use the ANSI TIMESTAMP (or TO_DATE) syntax:

SQL> @ashtop sql_id username='SOE' "TIMESTAMP'2015-03-29 13:00:00'" "TIMESTAMP'2015-03-29 13:15:00'"

  Seconds     AAS %This   SQL_ID        FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------- ------------------- ------------------- -----------------
      153      .2   22% | 56pwkjspvmg3h 2015-03-29 13:13:29 2015-03-29 13:14:59                 9
      132      .1   19% | gkxxkghxubh1a 2015-03-29 13:13:29 2015-03-29 13:14:59                 8
       95      .1   14% | 29qp10usqkqh0 2015-03-29 13:13:29 2015-03-29 13:14:52                 7
       69      .1   10% | c13sma6rkr27c 2015-03-29 13:13:31 2015-03-29 13:14:58                69
       41      .0    6% |               2015-03-29 13:13:34 2015-03-29 13:14:59                 1

Note that the arguments 3 & 4 above are in double quotes as there’s a space within the timestamp value. Without the double-quotes, sqlplus would think the script has total 6 arguments due to the spaces.

I don’t like to type too much though (every character counts!) so I was happy to see that the following sqlplus hack works. I just defined pairs of arguments as sqlplus DEFINE variables as seen below (also in init.sql now):

  -- geeky shorcuts for producing date ranges for various ASH scripts
  define     min="sysdate-1/24/60 sysdate"
  define  minute="sysdate-1/24/60 sysdate"
  define    5min="sysdate-1/24/12 sysdate"
  define    hour="sysdate-1/24 sysdate"
  define   2hours="sysdate-1/12 sysdate"
  define  24hours="sysdate-1 sysdate"
  define      day="sysdate-1 sysdate"
  define    today="TRUNC(sysdate) sysdate"

And now I can type just 3 arguments instead of 4 when I run some of my scripts and want some predefined behavior like seeing last 5 minutes’ activity:

SQL> @ashtop sql_id username='SOE' &5min

  Seconds     AAS %This   SQL_ID        FIRST_SEEN          LAST_SEEN           DIST_SQLEXEC_SEEN
--------- ------- ------- ------------- ------------------- ------------------- -----------------
      368     1.2   23% | gkxxkghxubh1a 2015-03-29 13:39:34 2015-03-29 13:44:33                37
      241      .8   15% | 56pwkjspvmg3h 2015-03-29 13:40:05 2015-03-29 13:44:33                25
      185      .6   12% | 29qp10usqkqh0 2015-03-29 13:39:40 2015-03-29 13:44:33                24
      129      .4    8% | c13sma6rkr27c 2015-03-29 13:39:35 2015-03-29 13:44:32               129
      107      .4    7% |               2015-03-29 13:39:34 2015-03-29 13:44:33                 1

That’s it, I hope this hack helps :-)

By the way – if you’re a command line & sqlplus fan, check out the SQLCL command line “new sqlplus” tool from the SQL Developer team! (you can download it from the SQL Dev early adopter page for now).


Posted in Oracle, Productivity | 1 Comment

Oracle Exadata Performance: Latest Improvements and Less Known Features

Here are the slides of a presentation I did at the IOUG Virtual Exadata conference in February. I’m explaining the basics of some new Oracle 12c things related to Exadata, plus current latest cellsrv improvements like Columnar Flash Cache and IO skipping for Min/Max retrieval using Storage Indexes:

Note that Christian Antognini and Roger MacNicol have written separate articles about some new features:



Posted in Exadata, Oracle, Oracle 12c | Leave a comment

Public Appearances 2015

Here’s where I’ll hang out in the following months:

11-12 Feb 2015: IOUG Exadata SIG Virtual Conference (free online event)

  • Presentation: Exadata Performance: Latest Improvements and Less Known Features
  • It’s a free online event, so sign up here

18-19 Feb 2015: RMOUG Training Days (in Denver)

  • I won’t speak there this year, but plan to hang out on Wednesday evening and drink beer
  • More info here

1-5 March 2015: Hotsos Symposium 2015

31 May – 2 June 2015: Enkitec E4

  • Even more awesome Exadata (and now also Hadoop) content there!
  • I plan to speak there again, about Exadata performance and/or integrating Oracle databases with Hadoop
  • More info here

Advanced Oracle Troubleshooting v3.0 training

  • One of the reasons why I’ve been so quiet in recent months is that I’ve been rebuilding my entire Advanced Oracle Troubleshooting training material from ground up.
  • This new seminar focuses on systematic Oracle troubleshooting and internals of database versions all the way to Oracle 12c.
  • I will launch the AOT seminar v3.0 in early March – you can already register your interest here!


Posted in Announcement, Oracle | Leave a comment

Oracle In-Memory Column Store Internals – Part 1 – Which SIMD extensions are getting used?

This is the first entry in a series of random articles about some useful internals-to-know of the awesome Oracle Database In-Memory column store. I intend to write about Oracle’s IM stuff that’s not already covered somewhere else and also about some general CPU topics (that are well covered elsewhere, but not always so well known in the Oracle DBA/developer world).

Before going into further details, you might want to review the Part 0 of this series and also our recent Oracle Database In-Memory Option in Action presentation with some examples. And then read this doc by Intel if you want more info on how the SIMD registers and instructions get used.

There’s a lot of talk about the use of your CPUs’ SIMD vector processing capabilities in the Oracle inmemory module, let’s start by checking if it’s enabled in your database at all. We’ll look into Linux/Intel examples here.

Continue reading

Posted in InMemory, Oracle, Oracle 12c | 22 Comments