Codebase list hwloc / debian/1.2-2
debian/1.2-2

Tree @debian/1.2-2 (Download .tar.gz)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
Introduction

hwloc provides command line tools and a C API to obtain the hierarchical map of
key computing elements, such as: NUMA memory nodes, shared caches, processor
sockets, processor cores, and processing units (logical processors or
"threads"). hwloc also gathers various attributes such as cache and memory
information, and is portable across a variety of different operating systems
and platforms.

hwloc primarily aims at helping high-performance computing (HPC) applications,
but is also applicable to any project seeking to exploit code and/or data
locality on modern computing platforms.

Note that the hwloc project represents the merger of the libtopology project
from INRIA and the Portable Linux Processor Affinity (PLPA) sub-project from
Open MPI. Both of these prior projects are now deprecated. The first hwloc
release was essentially a "re-branding" of the libtopology code base, but with
both a few genuinely new features and a few PLPA-like features added in. Prior
releases of hwloc included documentation about switching from PLPA to hwloc;
this documentation has been dropped on the assumption that everyone who was
using PLPA has already switched to hwloc.

hwloc supports the following operating systems:

  * Linux (including old kernels not having sysfs topology information, with
 knowledge of cpusets, offline CPUs, ScaleMP vSMP, and Kerrighed support)
  * Solaris
  * AIX
  * Darwin / OS X
  * FreeBSD and its variants, such as kFreeBSD/GNU
  * OSF/1 (a.k.a., Tru64)
  * HP-UX
  * Microsoft Windows

hwloc only reports the number of processors on unsupported operating systems;
no topology information is available.

For development and debugging purposes, hwloc also offers the ability to work
on "fake" topologies:

  * Symmetrical tree of resources generated from a list of level arities
  * Remote machine simulation through the gathering of Linux sysfs topology
 files

hwloc can display the topology in a human-readable format, either in graphical
mode (X11), or by exporting in one of several different formats, including:
plain text, PDF, PNG, and FIG (see CLI Examples below). Note that some of the
export formats require additional support libraries.

hwloc offers a programming interface for manipulating topologies and objects.
It also brings a powerful CPU bitmap API that is used to describe topology
objects location on physical/logical processors. See the Programming Interface
below. It may also be used to binding applications onto certain cores or memory
nodes. Several utility programs are also provided to ease command-line
manipulation of topology objects, binding of processes, and so on.

Installation

hwloc (http://www.open-mpi.org/projects/hwloc/) is available under the BSD
license. It is hosted as a sub-project of the overall Open MPI project (http://
www.open-mpi.org/). Note that hwloc does not require any functionality from
Open MPI -- it is a wholly separate (and much smaller!) project and code base.
It just happens to be hosted as part of the overall Open MPI project.

Nightly development snapshots are available on the web site. Additionally, the
code can be directly checked out of Subversion:

shell$ svn checkout http://svn.open-mpi.org/svn/hwloc/trunk hwloc-trunk
shell$ cd hwloc-trunk
shell$ ./autogen.sh

Note that GNU Autoconf >=2.63, Automake >=1.10 and Libtool >=2.2.6 are required
when building from a Subversion checkout.

Installation by itself is the fairly common GNU-based process:

shell$ ./configure --prefix=...
shell$ make
shell$ make install

The hwloc command-line tool "lstopo" produces human-readable topology maps, as
mentioned above. It can also export maps to the "fig" file format. Support for
PDF, Postscript, and PNG exporting is provided if the "Cairo" development
package can be found when hwloc is configured and build. Similarly, lstopo's
XML support requires the libxml2 development package.

CLI Examples

On a 4-socket 2-core machine with hyperthreading, the lstopo tool may show the
following graphical output:

                               dudley.png

Here's the equivalent output in textual form:

Machine (16GB)
  Socket L#0 + L3 L#0 (4096KB)
 L2 L#0 (1024KB) + L1 L#0 (16KB) + Core L#0
   PU L#0 (P#0)
   PU L#1 (P#8)
 L2 L#1 (1024KB) + L1 L#1 (16KB) + Core L#1
   PU L#2 (P#4)
   PU L#3 (P#12)
  Socket L#1 + L3 L#1 (4096KB)
 L2 L#2 (1024KB) + L1 L#2 (16KB) + Core L#2
   PU L#4 (P#1)
   PU L#5 (P#9)
 L2 L#3 (1024KB) + L1 L#3 (16KB) + Core L#3
   PU L#6 (P#5)
   PU L#7 (P#13)
  Socket L#2 + L3 L#2 (4096KB)
 L2 L#4 (1024KB) + L1 L#4 (16KB) + Core L#4
   PU L#8 (P#2)
   PU L#9 (P#10)
 L2 L#5 (1024KB) + L1 L#5 (16KB) + Core L#5
   PU L#10 (P#6)
   PU L#11 (P#14)
  Socket L#3 + L3 L#3 (4096KB)
 L2 L#6 (1024KB) + L1 L#6 (16KB) + Core L#6
   PU L#12 (P#3)
   PU L#13 (P#11)
 L2 L#7 (1024KB) + L1 L#7 (16KB) + Core L#7
   PU L#14 (P#7)
   PU L#15 (P#15)

Finally, here's the equivalent output in XML. Long lines were artificially
broken for document clarity (in the real output, each XML tag is on a single
line), and only socket #0 is shown for brevity:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x0000ffff"
   complete_cpuset="0x0000ffff" online_cpuset="0x0000ffff"
   allowed_cpuset="0x0000ffff"
   dmi_board_vendor="Dell Computer Corporation" dmi_board_name="0RD318"
   local_memory="16648183808">
 <page_type size="4096" count="4064498"/>
 <page_type size="2097152" count="0"/>
 <object type="Socket" os_level="-1" os_index="0" cpuset="0x00001111"
     complete_cpuset="0x00001111" online_cpuset="0x00001111"
     allowed_cpuset="0x00001111">
   <object type="Cache" os_level="-1" cpuset="0x00001111"
       complete_cpuset="0x00001111" online_cpuset="0x00001111"
       allowed_cpuset="0x00001111" cache_size="4194304" depth="3"
       cache_linesize="64">
     <object type="Cache" os_level="-1" cpuset="0x00000101"
         complete_cpuset="0x00000101" online_cpuset="0x00000101"
         allowed_cpuset="0x00000101" cache_size="1048576" depth="2"
         cache_linesize="64">
       <object type="Cache" os_level="-1" cpuset="0x00000101"
           complete_cpuset="0x00000101" online_cpuset="0x00000101"
           allowed_cpuset="0x00000101" cache_size="16384" depth="1"
           cache_linesize="64">
         <object type="Core" os_level="-1" os_index="0" cpuset="0x00000101"
             complete_cpuset="0x00000101" online_cpuset="0x00000101"
             allowed_cpuset="0x00000101">
           <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
               complete_cpuset="0x00000001" online_cpuset="0x00000001"
               allowed_cpuset="0x00000001"/>
           <object type="PU" os_level="-1" os_index="8" cpuset="0x00000100"
               complete_cpuset="0x00000100" online_cpuset="0x00000100"
               allowed_cpuset="0x00000100"/>
         </object>
       </object>
     </object>
     <object type="Cache" os_level="-1" cpuset="0x00001010"
         complete_cpuset="0x00001010" online_cpuset="0x00001010"
         allowed_cpuset="0x00001010" cache_size="1048576" depth="2"
         cache_linesize="64">
       <object type="Cache" os_level="-1" cpuset="0x00001010"
           complete_cpuset="0x00001010" online_cpuset="0x00001010"
           allowed_cpuset="0x00001010" cache_size="16384" depth="1"
           cache_linesize="64">
         <object type="Core" os_level="-1" os_index="1" cpuset="0x00001010"
             complete_cpuset="0x00001010" online_cpuset="0x00001010"
             allowed_cpuset="0x00001010">
           <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010"
               complete_cpuset="0x00000010" online_cpuset="0x00000010"
               allowed_cpuset="0x00000010"/>
           <object type="PU" os_level="-1" os_index="12" cpuset="0x00001000"
               complete_cpuset="0x00001000" online_cpuset="0x00001000"
               allowed_cpuset="0x00001000"/>
         </object>
       </object>
     </object>
   </object>
 </object>
 <!-- ...other sockets listed here ... -->
  </object>
</topology>

On a 4-socket 2-core Opteron NUMA machine, the lstopo tool may show the
following graphical output:

                               hagrid.png

Here's the equivalent output in textual form:

Machine (32GB)
  NUMANode L#0 (P#0 8190MB) + Socket L#0
 L2 L#0 (1024KB) + L1 L#0 (64KB) + Core L#0 + PU L#0 (P#0)
 L2 L#1 (1024KB) + L1 L#1 (64KB) + Core L#1 + PU L#1 (P#1)
  NUMANode L#1 (P#1 8192MB) + Socket L#1
 L2 L#2 (1024KB) + L1 L#2 (64KB) + Core L#2 + PU L#2 (P#2)
 L2 L#3 (1024KB) + L1 L#3 (64KB) + Core L#3 + PU L#3 (P#3)
  NUMANode L#2 (P#2 8192MB) + Socket L#2
 L2 L#4 (1024KB) + L1 L#4 (64KB) + Core L#4 + PU L#4 (P#4)
 L2 L#5 (1024KB) + L1 L#5 (64KB) + Core L#5 + PU L#5 (P#5)
  NUMANode L#3 (P#3 8192MB) + Socket L#3
 L2 L#6 (1024KB) + L1 L#6 (64KB) + Core L#6 + PU L#6 (P#6)
 L2 L#7 (1024KB) + L1 L#7 (64KB) + Core L#7 + PU L#7 (P#7)

And here's the equivalent output in XML. Similar to above, line breaks were
added and only PU #0 is shown for brevity:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff"
   complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
   allowed_cpuset="0x000000ff" nodeset="0x000000ff"
   complete_nodeset="0x000000ff" allowed_nodeset="0x000000ff"
   dmi_board_vendor="TYAN Computer Corp" dmi_board_name="S4881 ">
 <page_type size="4096" count="0"/>
 <page_type size="2097152" count="0"/>
 <object type="NUMANode" os_level="-1" os_index="0" cpuset="0x00000003"
     complete_cpuset="0x00000003" online_cpuset="0x00000003"
     allowed_cpuset="0x00000003" nodeset="0x00000001"
     complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
     local_memory="7514177536">
   <page_type size="4096" count="1834516"/>
   <page_type size="2097152" count="0"/>
   <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000003"
       complete_cpuset="0x00000003" online_cpuset="0x00000003"
       allowed_cpuset="0x00000003" nodeset="0x00000001"
       complete_nodeset="0x00000001" allowed_nodeset="0x00000001">
     <object type="Cache" os_level="-1" cpuset="0x00000001"
         complete_cpuset="0x00000001" online_cpuset="0x00000001"
         allowed_cpuset="0x00000001" nodeset="0x00000001"
         complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
         cache_size="1048576" depth="2" cache_linesize="64">
       <object type="Cache" os_level="-1" cpuset="0x00000001"
           complete_cpuset="0x00000001" online_cpuset="0x00000001"
           allowed_cpuset="0x00000001" nodeset="0x00000001"
           complete_nodeset="0x00000001" allowed_nodeset="0x00000001"
           cache_size="65536" depth="1" cache_linesize="64">
         <object type="Core" os_level="-1" os_index="0"
             cpuset="0x00000001" complete_cpuset="0x00000001"
             online_cpuset="0x00000001" allowed_cpuset="0x00000001"
             nodeset="0x00000001" complete_nodeset="0x00000001"
             allowed_nodeset="0x00000001">
           <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
               complete_cpuset="0x00000001" online_cpuset="0x00000001"
               allowed_cpuset="0x00000001" nodeset="0x00000001"
               complete_nodeset="0x00000001" allowed_nodeset="0x00000001"/>
         </object>
       </object>
     </object>
  <!-- ...more objects listed here ... -->
</topology>

On a 2-socket quad-core Xeon (pre-Nehalem, with 2 dual-core dies into each
socket):

                               emmett.png

Here's the same output in textual form:

Machine (16GB)
  Socket L#0
 L2 L#0 (4096KB)
   L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
   L1 L#1 (32KB) + Core L#1 + PU L#1 (P#4)
 L2 L#1 (4096KB)
   L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
   L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
  Socket L#1
 L2 L#2 (4096KB)
   L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
   L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
 L2 L#3 (4096KB)
   L1 L#6 (32KB) + Core L#6 + PU L#6 (P#3)
   L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)

And the same output in XML (line breaks added, only PU #0 shown):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topology SYSTEM "hwloc.dtd">
<topology>
  <object type="Machine" os_level="-1" os_index="0" cpuset="0x000000ff"
   complete_cpuset="0x000000ff" online_cpuset="0x000000ff"
   allowed_cpuset="0x000000ff" dmi_board_vendor="Dell Inc."
   dmi_board_name="0NR282" local_memory="16865292288">
 <page_type size="4096" count="4117503"/>
 <page_type size="2097152" count="0"/>
 <object type="Socket" os_level="-1" os_index="0" cpuset="0x00000055"
     complete_cpuset="0x00000055" online_cpuset="0x00000055"
     allowed_cpuset="0x00000055">
   <object type="Cache" os_level="-1" cpuset="0x00000011"
       complete_cpuset="0x00000011" online_cpuset="0x00000011"
       allowed_cpuset="0x00000011" cache_size="4194304" depth="2"
       cache_linesize="64">
     <object type="Cache" os_level="-1" cpuset="0x00000001"
         complete_cpuset="0x00000001" online_cpuset="0x00000001"
         allowed_cpuset="0x00000001" cache_size="32768" depth="1"
         cache_linesize="64">
       <object type="Core" os_level="-1" os_index="0" cpuset="0x00000001"
           complete_cpuset="0x00000001" online_cpuset="0x00000001"
           allowed_cpuset="0x00000001">
         <object type="PU" os_level="-1" os_index="0" cpuset="0x00000001"
             complete_cpuset="0x00000001" online_cpuset="0x00000001"
             allowed_cpuset="0x00000001"/>
       </object>
     </object>
     <object type="Cache" os_level="-1" cpuset="0x00000010"
         complete_cpuset="0x00000010" online_cpuset="0x00000010"
         allowed_cpuset="0x00000010" cache_size="32768" depth="1"
         cache_linesize="64">
       <object type="Core" os_level="-1" os_index="1" cpuset="0x00000010"
           complete_cpuset="0x00000010" online_cpuset="0x00000010"
           allowed_cpuset="0x00000010">
         <object type="PU" os_level="-1" os_index="4" cpuset="0x00000010"
             complete_cpuset="0x00000010" online_cpuset="0x00000010"
             allowed_cpuset="0x00000010"/>
       </object>
     </object>
   </object>
  <!-- ...more objects listed here ... -->
</topology>

Programming Interface

The basic interface is available in hwloc.h. It essentially offers low-level
routines for advanced programmers that want to manually manipulate objects and
follow links between them. Documentation for everything in hwloc.h are provided
later in this document. Developers should also look at hwloc/helper.h (and also
in this document, which provides good higher-level topology traversal examples.

To precisely define the vocabulary used by hwloc, a Terms and Definitions
section is available and should probably be read first.

Each hwloc object contains a cpuset describing the list of processing units
that it contains. These bitmaps may be used for CPU binding and Memory binding.
hwloc offers an extensive bitmap manipulation interface in hwloc/bitmap.h.

Moreover, hwloc also comes with additional helpers for interoperability with
several commonly used environments. See the Interoperability With Other
Software section for details.

The complete API documentation is available in a full set of HTML pages, man
pages, and self-contained PDF files (formatted for both both US letter and A4
formats) in the source tarball in doc/doxygen-doc/.

NOTE: If you are building the documentation from a Subversion checkout, you
will need to have Doxygen and pdflatex installed -- the documentation will be
built during the normal "make" process. The documentation is installed during
"make install" to $prefix/share/doc/hwloc/ and your systems default man page
tree (under $prefix, of course).

Portability

As shown in CLI Examples, hwloc can obtain information on a wide variety of
hardware topologies. However, some platforms and/or operating system versions
will only report a subset of this information. For example, on an PPC64-based
system with 32 cores (each with 2 hardware threads) running a default
2.6.18-based kernel from RHEL 5.4, hwloc is only able to glean information
about NUMA nodes and processor units (PUs). No information about caches,
sockets, or cores is available.

Similarly, Operating System have varying support for CPU and memory binding,
e.g. while some Operating Systems provide interfaces for all kinds of CPU and
memory bindings, some others provide only interfaces for a limited number of
kinds of CPU and memory binding, and some do not provide any binding interface
at all. Hwloc's binding functions would then simply return the ENOSYS error
(Function not implemented), meaning that the underlying Operating System does
not provide any interface for them. CPU binding and Memory binding provide more
information on which hwloc binding functions should be preferred because
interfaces for them are usually available on the supported Operating Systems.

Here's the graphical output from lstopo on this platform when Simultaneous
Multi-Threading (SMT) is enabled:

                           ppc64-with-smt.png

And here's the graphical output from lstopo on this platform when SMT is
disabled:

                          ppc64-without-smt.png

Notice that hwloc only sees half the PUs when SMT is disabled. PU #15, for
example, seems to change location from NUMA node #0 to #1. In reality, no PUs
"moved" -- they were simply re-numbered when hwloc only saw half as many.
Hence, PU #15 in the SMT-disabled picture probably corresponds to PU #30 in the
SMT-enabled picture.

This same "PUs have disappeared" effect can be seen on other platforms -- even
platforms / OSs that provide much more information than the above PPC64 system.
This is an unfortunate side-effect of how operating systems report information
to hwloc.

Note that upgrading the Linux kernel on the same PPC64 system mentioned above
to 2.6.34, hwloc is able to discover all the topology information. The
following picture shows the entire topology layout when SMT is enabled:

                         ppc64-full-with-smt.png

Developers using the hwloc API or XML output for portable applications should
therefore be extremely careful to not make any assumptions about the structure
of data that is returned. For example, per the above reported PPC topology, it
is not safe to assume that PUs will always be descendants of cores.

Additionally, future hardware may insert new topology elements that are not
available in this version of hwloc. Long-lived applications that are meant to
span multiple different hardware platforms should also be careful about making
structure assumptions. For example, there may someday be an element "lower"
than a PU, or perhaps a new element may exist between a core and a PU.

API Example

The following small C example (named ``hwloc-hello.c'') prints the topology of
the machine and bring the process to the first logical processor of the second
core of the machine.

/* Example hwloc API program.
 *
 * Copyright ? 2009-2010 INRIA.  All rights reserved.
 * Copyright ? 2009-2011 Universit? Bordeaux 1
 * Copyright ? 2009-2010 Cisco Systems, Inc.  All rights reserved.
 * See COPYING in top-level directory.
 *
 * hwloc-hello.c
 */

#include <hwloc.h>
#include <errno.h>
#include <stdio.h>
#include <string.h>

static void print_children(hwloc_topology_t topology, hwloc_obj_t obj,
                        int depth)
{
 char string[128];
 unsigned i;

 hwloc_obj_snprintf(string, sizeof(string), topology, obj, "#", 0);
 printf("%*s%s\n", 2*depth, "", string);
 for (i = 0; i < obj->arity; i++) {
     print_children(topology, obj->children[i], depth + 1);
 }
}

int main(void)
{
 int depth;
 unsigned i, n;
 unsigned long size;
 int levels;
 char string[128];
 int topodepth;
 hwloc_topology_t topology;
 hwloc_cpuset_t cpuset;
 hwloc_obj_t obj;

 /* Allocate and initialize topology object. */
 hwloc_topology_init(&topology);

 /* ... Optionally, put detection configuration here to ignore
    some objects types, define a synthetic topology, etc....  

    The default is to detect all the objects of the machine that
    the caller is allowed to access.  See Configure Topology
    Detection. */

 /* Perform the topology detection. */
 hwloc_topology_load(topology);

 /* Optionally, get some additional topology information
    in case we need the topology depth later. */
 topodepth = hwloc_topology_get_depth(topology);

 /*****************************************************************
  * First example:
  * Walk the topology with an array style, from level 0 (always
  * the system level) to the lowest level (always the proc level).
  *****************************************************************/
 for (depth = 0; depth < topodepth; depth++) {
     printf("*** Objects at level %d\n", depth);
     for (i = 0; i < hwloc_get_nbobjs_by_depth(topology, depth);
          i++) {
         hwloc_obj_snprintf(string, sizeof(string), topology,
                    hwloc_get_obj_by_depth(topology, depth, i),
                    "#", 0);
         printf("Index %u: %s\n", i, string);
     }
 }

 /*****************************************************************
  * Second example:
  * Walk the topology with a tree style.
  *****************************************************************/
 printf("*** Printing overall tree\n");
 print_children(topology, hwloc_get_root_obj(topology), 0);

 /*****************************************************************
  * Third example:
  * Print the number of sockets.
  *****************************************************************/
 depth = hwloc_get_type_depth(topology, HWLOC_OBJ_SOCKET);
 if (depth == HWLOC_TYPE_DEPTH_UNKNOWN) {
     printf("*** The number of sockets is unknown\n");
 } else {
     printf("*** %u socket(s)\n",
            hwloc_get_nbobjs_by_depth(topology, depth));
 }

 /*****************************************************************
  * Fourth example:
  * Compute the amount of cache that the first logical processor
  * has above it.
  *****************************************************************/
 levels = 0;
 size = 0;
 for (obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_PU, 0);
      obj;
      obj = obj->parent)
   if (obj->type == HWLOC_OBJ_CACHE) {
     levels++;
     size += obj->attr->cache.size;
   }
 printf("*** Logical processor 0 has %d caches totaling %luKB\n",
        levels, size / 1024);

 /*****************************************************************
  * Fifth example:
  * Bind to only one thread of the last core of the machine.
  *
  * First find out where cores are, or else smaller sets of CPUs if
  * the OS doesn't have the notion of a "core".
  *****************************************************************/
 depth = hwloc_get_type_or_below_depth(topology, HWLOC_OBJ_CORE);

 /* Get last core. */
 obj = hwloc_get_obj_by_depth(topology, depth,
                hwloc_get_nbobjs_by_depth(topology, depth) - 1);
 if (obj) {
     /* Get a copy of its cpuset that we may modify. */
     cpuset = hwloc_bitmap_dup(obj->cpuset);

     /* Get only one logical processor (in case the core is
        SMT/hyperthreaded). */
     hwloc_bitmap_singlify(cpuset);

     /* And try to bind ourself there. */
     if (hwloc_set_cpubind(topology, cpuset, 0)) {
         char *str;
         int error = errno;
         hwloc_bitmap_asprintf(&str, obj->cpuset);
         printf("Couldn't bind to cpuset %s: %s\n", str, strerror(error));
         free(str);
     }

     /* Free our cpuset copy */
     hwloc_bitmap_free(cpuset);
 }

 /*****************************************************************
  * Sixth example:
  * Allocate some memory on the last NUMA node, bind some existing
  * memory to the last NUMA node.
  *****************************************************************/
 /* Get last node. */
 n = hwloc_get_nbobjs_by_type(topology, HWLOC_OBJ_NODE);
 if (n) {
     void *m;
     size = 1024*1024;

     obj = hwloc_get_obj_by_type(topology, HWLOC_OBJ_NODE, n - 1);
     m = hwloc_alloc_membind_nodeset(topology, size, obj->nodeset,
             HWLOC_MEMBIND_DEFAULT, 0);
     hwloc_free(topology, m, size);

     m = malloc(size);
     hwloc_set_area_membind_nodeset(topology, m, size, obj->nodeset,
             HWLOC_MEMBIND_DEFAULT, 0);
     free(m);
 }

 /* Destroy topology object. */
 hwloc_topology_destroy(topology);

 return 0;
}

hwloc provides a pkg-config executable to obtain relevant compiler and linker
flags. For example, it can be used thusly to compile applications that utilize
the hwloc library (assuming GNU Make):

CFLAGS += $(pkg-config --cflags hwloc)
LDLIBS += $(pkg-config --libs hwloc)
cc hwloc-hello.c $(CFLAGS) -o hwloc-hello $(LDLIBS)

On a machine with 4GB of RAM and 2 processor sockets -- each socket of which
has two processing cores -- the output from running hwloc-hello could be
something like the following:

shell$ ./hwloc-hello
*** Objects at level 0
Index 0: Machine(3938MB)
*** Objects at level 1
Index 0: Socket#0
Index 1: Socket#1
*** Objects at level 2
Index 0: Core#0
Index 1: Core#1
Index 2: Core#3
Index 3: Core#2
*** Objects at level 3
Index 0: PU#0
Index 1: PU#1
Index 2: PU#2
Index 3: PU#3
*** Printing overall tree
Machine(3938MB)
  Socket#0
 Core#0
   PU#0
 Core#1
   PU#1
  Socket#1
 Core#3
   PU#2
 Core#2
   PU#3
*** 2 socket(s)
shell$

Questions and Bugs

Questions should be sent to the devel mailing list (http://www.open-mpi.org/
community/lists/hwloc.php). Bug reports should be reported in the tracker (
https://svn.open-mpi.org/trac/hwloc/).

If hwloc discovers an incorrect topology for your machine, the very first thing
you should check is to ensure that you have the most recent updates installed
for your operating system. Indeed, most of hwloc topology discovery relies on
hardware information retrieved through the operation system (e.g., via the /sys
virtual filesystem of the Linux kernel). If upgrading your OS or Linux kernel
does not solve your problem, you may also want to ensure that you are running
the most recent version of the BIOS for your machine.

If those things fail, contact us on the mailing list for additional help.
Please attach the output of lstopo after having given the --enable-debug option
to ./configure and rebuilt completely, to get debugging output.

History / Credits

hwloc is the evolution and merger of the libtopology (http://
runtime.bordeaux.inria.fr/libtopology/) project and the Portable Linux
Processor Affinity (PLPA) (http://www.open-mpi.org/projects/plpa/) project.
Because of functional and ideological overlap, these two code bases and ideas
were merged and released under the name "hwloc" as an Open MPI sub-project.

libtopology was initially developed by the INRIA Runtime Team-Project (http://
runtime.bordeaux.inria.fr/) (headed by Raymond Namyst (http://
dept-info.labri.fr/~namyst/). PLPA was initially developed by the Open MPI
development team as a sub-project. Both are now deprecated in favor of hwloc,
which is distributed as an Open MPI sub-project.

Further Reading

The documentation chapters include

  * Terms and Definitions
  * Command-Line Tools
  * Environment Variables
  * CPU and Memory Binding Overview
  * Interoperability With Other Software
  * Thread Safety
  * Embedding hwloc in Other Software
  * Frequently Asked Questions

Make sure to have had a look at those too!

-------------------------------------------------------------------------------

Generated on Thu Apr 14 2011 22:34:49 for Hardware Locality (hwloc) by  doxygen
1.7.3