Codebase list libvirt / upstream/1.3.3_rc2 docs / drvlxc.html
upstream/1.3.3_rc2

Tree @upstream/1.3.3_rc2 (Download .tar.gz)

drvlxc.html @upstream/1.3.3_rc2raw · history · blame

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<!--
        This file is autogenerated from drvlxc.html.in
        Do not edit this file. Changes will be lost.
      -->
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    <link rel="stylesheet" type="text/css" href="main.css" />
    <link rel="SHORTCUT ICON" href="32favicon.png" />
    <title>libvirt: LXC container driver</title>
    <meta name="description" content="libvirt, virtualization, virtualization API" />
  </head>
  <body>
    <div id="header">
      <div id="headerLogo"></div>
      <div id="headerSearch">
        <form action="search.php" enctype="application/x-www-form-urlencoded" method="get"><div>
            <input id="query" name="query" type="text" size="12" value="" />
            <input id="submit" name="submit" type="submit" value="Search" />
          </div></form>
      </div>
    </div>
    <div id="body">
      <div id="menu">
        <ul class="l0"><li>
            <div>
              <a title="Front page of the libvirt website" class="inactive" href="index.html">Home</a>
            </div>
          </li><li>
            <div>
              <a title="Details of new features and bugs fixed in each release" class="inactive" href="news.html">News</a>
            </div>
          </li><li>
            <div>
              <a title="Applications known to use libvirt" class="inactive" href="apps.html">Applications</a>
            </div>
          </li><li>
            <div>
              <a title="Get the latest source releases, binary builds and get access to the source repository" class="inactive" href="downloads.html">Downloads</a>
            </div>
          </li><li>
            <div>
              <a title="Information for users, administrators and developers" class="active" href="docs.html">Documentation</a>
              <ul class="l1"><li>
                  <div>
                    <a title="How to compile libvirt" class="inactive" href="compiling.html">Compiling</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Information about deploying and using libvirt" class="inactive" href="deployment.html">Deployment</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Overview of the logical subsystems in the libvirt API" class="inactive" href="intro.html">Architecture</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Description of the XML formats used in libvirt" class="inactive" href="format.html">XML format</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Hypervisor specific driver information" class="active" href="drivers.html">Drivers</a>
                    <ul class="l2"><li>
                        <div>
                          <a title="Driver the Xen hypervisor" class="inactive" href="drvxen.html">Xen</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for QEMU, KQEMU, KVM and Xenner" class="inactive" href="drvqemu.html">QEMU / KVM</a>
                        </div>
                      </li><li>
                        <div>
                          <span class="active">Linux Container</span>
                        </div>
                      </li><li>
                        <div>
                          <a title="Pseudo-driver simulating APIs in memory for test suites" class="inactive" href="drvtest.html">Test</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver providing secure remote to the libvirt APIs" class="inactive" href="drvremote.html">Remote</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for the OpenVZ container technology" class="inactive" href="drvopenvz.html">OpenVZ</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for the User Mode Linux technology" class="inactive" href="drvuml.html">UML</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for the storage management APIs" class="inactive" href="storage.html">Storage</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for VirtualBox" class="inactive" href="drvvbox.html">VirtualBox</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for VMware ESX" class="inactive" href="drvesx.html">VMware ESX</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for VMware Workstation / Player" class="inactive" href="drvvmware.html">VMware Workstation / Player</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for Microsoft Hyper-V" class="inactive" href="drvhyperv.html">Microsoft Hyper-V</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for IBM PowerVM" class="inactive" href="drvphyp.html">IBM PowerVM</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for Virtuozzo" class="inactive" href="drvvirtuozzo.html">Virtuozzo</a>
                        </div>
                      </li><li>
                        <div>
                          <a title="Driver for bhyve" class="inactive" href="drvbhyve.html">Bhyve</a>
                        </div>
                      </li></ul>
                  </div>
                </li><li>
                  <div>
                    <a title="Reference manual for the C public API" class="inactive" href="html/index.html">API reference</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Bindings of the libvirt API for other languages" class="inactive" href="bindings.html">Language bindings</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Working on the internals of libvirt API, driver and daemon code" class="inactive" href="internals.html">Internals</a>
                  </div>
                </li><li>
                  <div>
                    <a title="A guide and reference for developing with libvirt" class="inactive" href="devguide.html">Development Guide</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Command reference for virsh" class="inactive" href="virshcmdref.html">Virsh Commands</a>
                  </div>
                </li><li>
                  <div>
                    <a title="Project governance and code of conduct" class="inactive" href="governance.html">Governance</a>
                  </div>
                </li></ul>
            </div>
          </li><li>
            <div>
              <a title="User contributed content" class="inactive" href="http://wiki.libvirt.org">Wiki</a>
            </div>
          </li><li>
            <div>
              <a title="Frequently asked questions" class="inactive" href="http://wiki.libvirt.org/page/FAQ">FAQ</a>
            </div>
          </li><li>
            <div>
              <a title="How and where to report bugs and request features" class="inactive" href="bugs.html">Bug reports</a>
            </div>
          </li><li>
            <div>
              <a title="How to contact the developers via email and IRC" class="inactive" href="contact.html">Contact</a>
            </div>
          </li><li>
            <div>
              <a title="Available test suites for libvirt" class="inactive" href="testsuites.html">Test suites</a>
            </div>
          </li><li>
            <div>
              <a title="Miscellaneous links of interest related to libvirt" class="inactive" href="relatedlinks.html">Related Links</a>
            </div>
          </li><li>
            <div>
              <a title="Overview of all content on the website" class="inactive" href="sitemap.html">Sitemap</a>
            </div>
          </li></ul>
      </div>
      <div id="content">
        <h1>LXC container driver</h1>
        <ul><li>
            <a href="#cgroups">Control groups Requirements</a>
          </li><li>
            <a href="#namespaces">Namespace requirements</a>
          </li><li>
            <a href="#init">Default container setup</a>
            <ul><li>
                <a href="#cliargs">Command line arguments</a>
              </li><li>
                <a href="#envvars">Environment variables</a>
              </li><li>
                <a href="#fsmounts">Filesystem mounts</a>
              </li><li>
                <a href="#devnodes">Device nodes</a>
              </li></ul>
          </li><li>
            <a href="#security">Security considerations</a>
            <ul><li>
                <a href="#securenetworking">Network isolation</a>
              </li><li>
                <a href="#securefs">Filesystem isolation</a>
              </li><li>
                <a href="#secureusers">User and group isolation</a>
              </li></ul>
          </li><li>
            <a href="#activation">Systemd Socket Activation Integration</a>
          </li><li>
            <a href="#exconfig">Example configurations</a>
          </li><li>
            <a href="#capabilities">Altering the available capabilities</a>
          </li><li>
            <a href="#share">Inherit namespaces</a>
          </li><li>
            <a href="#usage">Container usage / management</a>
            <ul><li>
                <a href="#usageSave">Defining (saving) container configuration</a>
              </li><li>
                <a href="#usageView">Viewing container configuration</a>
              </li><li>
                <a href="#usageStart">Starting containers</a>
              </li><li>
                <a href="#usageStop">Stopping containers</a>
              </li><li>
                <a href="#usageReboot">Rebooting a container</a>
              </li><li>
                <a href="#usageDelete">Undefining (deleting) a container configuration</a>
              </li><li>
                <a href="#usageConnect">Connecting to a container console</a>
              </li><li>
                <a href="#usageEnter">Running commands in a container</a>
              </li><li>
                <a href="#usageTop">Monitoring container utilization</a>
              </li><li>
                <a href="#usageConvert">Converting LXC container configuration</a>
              </li></ul>
          </li></ul>
        <p>
The libvirt LXC driver manages "Linux Containers". At their simplest, containers
can just be thought of as a collection of processes, separated from the main
host processes via a set of resource namespaces and constrained via control
groups resource tunables. The libvirt LXC driver has no dependency on the LXC
userspace tools hosted on sourceforge.net. It directly utilizes the relevant
kernel features to build the container environment. This allows for sharing
of many libvirt technologies across both the QEMU/KVM and LXC drivers. In
particular sVirt for mandatory access control, auditing of operations,
integration with control groups and many other features.
</p>
        <h2>
          <a name="cgroups" shape="rect" id="cgroups">Control groups Requirements</a>
          <a class="headerlink" href="#cgroups" title="Permalink to this headline"></a>
        </h2>
        <p>
In order to control the resource usage of processes inside containers, the
libvirt LXC driver requires that certain cgroups controllers are mounted on
the host OS. The minimum required controllers are 'cpuacct', 'memory' and
'devices', while recommended extra controllers are 'cpu', 'freezer' and
'blkio'. Libvirt will not mount the cgroups filesystem itself, leaving
this up to the init system to take care of. Systemd will do the right thing
in this respect, while for other init systems the <code>cgconfig</code>
init service will be required. For further information, consult the general
libvirt <a href="cgroups.html" shape="rect">cgroups documentation</a>.
</p>
        <h2>
          <a name="namespaces" shape="rect" id="namespaces">Namespace requirements</a>
          <a class="headerlink" href="#namespaces" title="Permalink to this headline"></a>
        </h2>
        <p>
In order to separate processes inside a container from those in the
primary "host" OS environment, the libvirt LXC driver requires that
certain kernel namespaces are compiled in. Libvirt currently requires
the 'mount', 'ipc', 'pid', and 'uts' namespaces to be available. If
separate network interfaces are desired, then the 'net' namespace is
required. If the guest configuration declares a
<a href="formatdomain.html#elementsOSContainer" shape="rect">UID or GID mapping</a>,
the 'user' namespace will be enabled to apply these. <strong>A suitably
configured UID/GID mapping is a pre-requisite to making containers
secure, in the absence of sVirt confinement.</strong>
</p>
        <h2>
          <a name="init" shape="rect" id="init">Default container setup</a>
          <a class="headerlink" href="#init" title="Permalink to this headline"></a>
        </h2>
        <h3>
          <a name="cliargs" shape="rect" id="cliargs">Command line arguments</a>
          <a class="headerlink" href="#cliargs" title="Permalink to this headline"></a>
        </h3>
        <p>
When the container "init" process is started, it will typically
not be given any command line arguments (eg the equivalent of
the bootloader args visible in <code>/proc/cmdline</code>). If
any arguments are desired, then must be explicitly set in the
container XML configuration via one or more <code>initarg</code>
elements. For example, to run <code>systemd --unit emergency.service</code>
would use the following XML
</p>
        <pre xml:space="preserve">
  &lt;os&gt;
    &lt;type arch='x86_64'&gt;exe&lt;/type&gt;
    &lt;init&gt;/bin/systemd&lt;/init&gt;
    &lt;initarg&gt;--unit&lt;/initarg&gt;
    &lt;initarg&gt;emergency.service&lt;/initarg&gt;
  &lt;/os&gt;
</pre>
        <h3>
          <a name="envvars" shape="rect" id="envvars">Environment variables</a>
          <a class="headerlink" href="#envvars" title="Permalink to this headline"></a>
        </h3>
        <p>
When the container "init" process is started, it will be given several useful
environment variables. The following standard environment variables are mandated
by <a href="http://www.freedesktop.org/wiki/Software/systemd/ContainerInterface" shape="rect">systemd container interface</a>
to be provided by all container technologies on Linux.
</p>
        <dl><dt>container</dt><dd>The fixed string <code>libvirt-lxc</code> to identify libvirt as the creator</dd><dt>container_uuid</dt><dd>The UUID assigned to the container by libvirt</dd><dt>PATH</dt><dd>The fixed string <code>/bin:/usr/bin</code></dd><dt>TERM</dt><dd>The fixed string <code>linux</code></dd><dt>HOME</dt><dd>The fixed string <code>/</code></dd></dl>
        <p>
In addition to the standard variables, the following libvirt specific
environment variables are also provided
</p>
        <dl><dt>LIBVIRT_LXC_NAME</dt><dd>The name assigned to the container by libvirt</dd><dt>LIBVIRT_LXC_UUID</dt><dd>The UUID assigned to the container by libvirt</dd><dt>LIBVIRT_LXC_CMDLINE</dt><dd>The unparsed command line arguments specified in the container configuration.
Use of this is discouraged, in favour of passing arguments directly to the
container init process via the <code>initarg</code> config element.</dd></dl>
        <h3>
          <a name="fsmounts" shape="rect" id="fsmounts">Filesystem mounts</a>
          <a class="headerlink" href="#fsmounts" title="Permalink to this headline"></a>
        </h3>
        <p>
In the absence of any explicit configuration, the container will
inherit the host OS filesystem mounts. A number of mount points will
be made read only, or re-mounted with new instances to provide
container specific data. The following special mounts are setup
by libvirt
</p>
        <ul><li><code>/dev</code> a new "tmpfs" pre-populated with authorized device nodes</li><li><code>/dev/pts</code> a new private "devpts" instance for console devices</li><li><code>/sys</code> the host "sysfs" instance remounted read-only</li><li><code>/proc</code> a new instance of the "proc" filesystem</li><li><code>/proc/sys</code> the host "/proc/sys" bind-mounted read-only</li><li><code>/sys/fs/selinux</code> the host "selinux" instance remounted read-only</li><li><code>/sys/fs/cgroup/NNNN</code> the host cgroups controllers bind-mounted to
only expose the sub-tree associated with the container</li><li><code>/proc/meminfo</code> a FUSE backed file reflecting memory limits of the container</li></ul>
        <h3>
          <a name="devnodes" shape="rect" id="devnodes">Device nodes</a>
          <a class="headerlink" href="#devnodes" title="Permalink to this headline"></a>
        </h3>
        <p>
The container init process will be started with <code>CAP_MKNOD</code>
capability removed and blocked from re-acquiring it. As such it will
not be able to create any device nodes in <code>/dev</code> or anywhere
else in its filesystems. Libvirt itself will take care of pre-populating
the <code>/dev</code> filesystem with any devices that the container
is authorized to use. The current devices that will be made available
to all containers are
</p>
        <ul><li><code>/dev/zero</code></li><li><code>/dev/null</code></li><li><code>/dev/full</code></li><li><code>/dev/random</code></li><li><code>/dev/urandom</code></li><li><code>/dev/stdin</code> symlinked to <code>/proc/self/fd/0</code></li><li><code>/dev/stdout</code> symlinked to <code>/proc/self/fd/1</code></li><li><code>/dev/stderr</code> symlinked to <code>/proc/self/fd/2</code></li><li><code>/dev/fd</code> symlinked to <code>/proc/self/fd</code></li><li><code>/dev/ptmx</code> symlinked to <code>/dev/pts/ptmx</code></li><li><code>/dev/console</code> symlinked to <code>/dev/pts/0</code></li></ul>
        <p>
In addition, for every console defined in the guest configuration,
a symlink will be created from <code>/dev/ttyN</code> symlinked to
the corresponding <code>/dev/pts/M</code> pseudo TTY device. The
first console will be <code>/dev/tty1</code>, with further consoles
numbered incrementally from there.
</p>
        <p>
Since /dev/ttyN and /dev/console are linked to the pts devices. The
tty device of login program is pts device. The pam module securetty
may prevent root user from logging in container. If you want root
user to log in container successfully, add the pts device to the file
/etc/securetty of container.
</p>
        <p>
Further block or character devices will be made available to containers
depending on their configuration.
</p>
        <h2>
          <a name="security" shape="rect" id="security">Security considerations</a>
          <a class="headerlink" href="#security" title="Permalink to this headline"></a>
        </h2>
        <p>
The libvirt LXC driver is fairly flexible in how it can be configured,
and as such does not enforce a requirement for strict security
separation between a container and the host. This allows it to be used
in scenarios where only resource control capabilities are important,
and resource sharing is desired. Applications wishing to ensure secure
isolation between a container and the host must ensure that they are
writing a suitable configuration.
</p>
        <h3>
          <a name="securenetworking" shape="rect" id="securenetworking">Network isolation</a>
          <a class="headerlink" href="#securenetworking" title="Permalink to this headline"></a>
        </h3>
        <p>
If the guest configuration does not list any network interfaces,
the <code>network</code> namespace will not be activated, and thus
the container will see all the host's network interfaces. This will
allow apps in the container to bind to/connect from TCP/UDP addresses
and ports from the host OS. It also allows applications to access
UNIX domain sockets associated with the host OS, which are in the
abstract namespace. If access to UNIX domains sockets in the abstract
namespace is not wanted, then applications should set the
<code>&lt;privnet/&gt;</code> flag in the
<code>&lt;features&gt;....&lt;/features&gt;</code> element.
</p>
        <h3>
          <a name="securefs" shape="rect" id="securefs">Filesystem isolation</a>
          <a class="headerlink" href="#securefs" title="Permalink to this headline"></a>
        </h3>
        <p>
If the guest configuration does not list any filesystems, then
the container will be set up with a root filesystem that matches
the host's root filesystem. As noted earlier, only a few locations
such as <code>/dev</code>, <code>/proc</code> and <code>/sys</code>
will be altered. This means that, in the absence of restrictions
from sVirt, a process running as user/group N:M inside the container
will be able to access almost exactly the same files as a process
running as user/group N:M in the host.
</p>
        <p>
There are multiple options for restricting this. It is possible to
simply map the existing root filesystem through to the container in
read-only mode. Alternatively a completely separate root filesystem
can be configured for the guest. In both cases, further sub-mounts
can be applied to customize the content that is made visible. Note
that in the absence of sVirt controls, it is still possible for the
root user in a container to unmount any sub-mounts applied. The user
namespace feature can also be used to restrict access to files based
on the UID/GID mappings.
</p>
        <p>
Sharing the host filesystem tree, also allows applications to access
UNIX domains sockets associated with the host OS, which are in the
filesystem namespaces. It should be noted that a number of init
systems including at least <code>systemd</code> and <code>upstart</code>
have UNIX domain socket which are used to control their operation.
Thus, if the directory/filesystem holding their UNIX domain socket is
exposed to the container, it will be possible for a user in the container
to invoke operations on the init service in the same way it could if
outside the container. This also applies to other applications in the
host which use UNIX domain sockets in the filesystem, such as DBus,
Libvirtd, and many more. If this is not desired, then applications
should either specify the UID/GID mapping in the configuration to
enable user namespaces and thus block access to the UNIX domain socket
based on permissions, or should ensure the relevant directories have
a bind mount to hide them. This is particularly important for the
<code>/run</code> or <code>/var/run</code> directories.
</p>
        <h3>
          <a name="secureusers" shape="rect" id="secureusers">User and group isolation</a>
          <a class="headerlink" href="#secureusers" title="Permalink to this headline"></a>
        </h3>
        <p>
If the guest configuration does not list any ID mapping, then the
user and group IDs used inside the container will match those used
outside the container. In addition, the capabilities associated with
a process in the container will infer the same privileges they would
for a process in the host. This has obvious implications for security,
since a root user inside the container will be able to access any
file owned by root that is visible to the container, and perform more
or less any privileged kernel operation. In the absence of additional
protection from sVirt, this means that the root user inside a container
is effectively as powerful as the root user in the host. There is no
security isolation of the root user.
</p>
        <p>
The ID mapping facility was introduced to allow for stricter control
over the privileges of users inside the container. It allows apps to
define rules such as "user ID 0 in the container maps to user ID 1000
in the host". In addition the privileges associated with capabilities
are somewhat reduced so that they cannot be used to escape from the
container environment. A full description of user namespaces is outside
the scope of this document, however LWN has
<a href="https://lwn.net/Articles/532593/" shape="rect">a good write-up on the topic</a>.
From the libvirt point of view, the key thing to remember is that defining
an ID mapping for users and groups in the container XML configuration
causes libvirt to activate the user namespace feature.
</p>
        <h2>
          <a name="activation" shape="rect" id="activation">Systemd Socket Activation Integration</a>
          <a class="headerlink" href="#activation" title="Permalink to this headline"></a>
        </h2>
        <p>
The libvirt LXC driver provides the ability to pass across pre-opened file
descriptors when starting LXC guests. This allows for libvirt LXC to support
systemd's <a href="http://0pointer.de/blog/projects/socket-activated-containers.html" shape="rect">socket
activation capability</a>, where an incoming client connection
in the host OS will trigger the startup of a container, which runs another
copy of systemd which gets passed the server socket, and then activates the
actual service handler in the container.
</p>
        <p>
Let us assume that you already have a LXC guest created, running
a systemd instance as PID 1 inside the container, which has an
SSHD service configured. The goal is to automatically activate
the container when the first SSH connection is made. The first
step is to create a couple of unit files for the host OS systemd
instance. The <code>/etc/systemd/system/mycontainer.service</code>
unit file specifies how systemd will start the libvirt LXC container
</p>
        <pre xml:space="preserve">
[Unit]
Description=My little container

[Service]
ExecStart=/usr/bin/virsh -c lxc:/// start --pass-fds 3 mycontainer
ExecStop=/usr/bin/virsh -c lxc:/// destroy mycontainer
Type=oneshot
RemainAfterExit=yes
KillMode=none
</pre>
        <p>
The <code>--pass-fds 3</code> argument specifies that the file
descriptor number 3 that <code>virsh</code> inherits from systemd,
is to be passed into the container. Since <code>virsh</code> will
exit immediately after starting the container, the <code>RemainAfterExit</code>
and <code>KillMode</code> settings must be altered from their defaults.
</p>
        <p>
Next, the <code>/etc/systemd/system/mycontainer.socket</code> unit
file is created to get the host systemd to listen on port 23 for
TCP connections. When this unit file is activated by the first
incoming connection, it will cause the <code>mycontainer.service</code>
unit to be activated with the FD corresponding to the listening TCP
socket passed in as FD 3.
</p>
        <pre xml:space="preserve">
[Unit]
Description=The SSH socket of my little container

[Socket]
ListenStream=23
</pre>
        <p>
Port 23 was picked here so that the container doesn't conflict
with the host's SSH which is on the normal port 22. That's it
in terms of host side configuration.
</p>
        <p>
Inside the container, the <code>/etc/systemd/system/sshd.socket</code>
unit file must be created
</p>
        <pre xml:space="preserve">
[Unit]
Description=SSH Socket for Per-Connection Servers

[Socket]
ListenStream=23
Accept=yes
</pre>
        <p>
The <code>ListenStream</code> value listed in this unit file, must
match the value used in the host file. When systemd in the container
receives the pre-opened FD from libvirt during container startup, it
looks at the <code>ListenStream</code> values to figure out which
FD to give to which service. The actual service to start is defined
by a correspondingly named <code>/etc/systemd/system/sshd@.service</code>
</p>
        <pre xml:space="preserve">
[Unit]
Description=SSH Per-Connection Server for %I

[Service]
ExecStart=-/usr/sbin/sshd -i
StandardInput=socket
</pre>
        <p>
Finally, make sure this SSH service is set to start on boot of the container,
by running the following command inside the container:
</p>
        <pre xml:space="preserve">
# mkdir -p /etc/systemd/system/sockets.target.wants/
# ln -s /etc/systemd/system/sshd.socket /etc/systemd/system/sockets.target.wants/
</pre>
        <p>
This example shows how to activate the container based on an incoming
SSH connection. If the container was also configured to have an httpd
service, it may be desirable to activate it upon either an httpd or a
sshd connection attempt. In this case, the <code>mycontainer.socket</code>
file in the host would simply list multiple socket ports. Inside the
container a separate <code>xxxxx.socket</code> file would need to be
created for each service, with a corresponding <code>ListenStream</code>
value set.
</p>
        <h2>Container security</h2>
        <h3>sVirt SELinux</h3>
        <p>
In the absence of the "user" namespace being used, containers cannot
be considered secure against exploits of the host OS. The sVirt SELinux
driver provides a way to secure containers even when the "user" namespace
is not used. The cost is that writing a policy to allow execution of
arbitrary OS is not practical. The SELinux sVirt policy is typically
tailored to work with an simpler application confinement use case,
as provided by the "libvirt-sandbox" project.
</p>
        <h3>Auditing</h3>
        <p>
The LXC driver is integrated with libvirt's auditing subsystem, which
causes audit messages to be logged whenever there is an operation
performed against a container which has impact on host resources.
So for example, start/stop, device hotplug will all log audit messages
providing details about what action occurred and any resources
associated with it. There are the following 3 types of audit messages
</p>
        <ul><li><code>VIRT_MACHINE_ID</code> - details of the SELinux process and
image security labels assigned to the container.</li><li><code>VIRT_CONTROL</code> - details of an action / operation
performed against a container. There are the following types of
operation
  <ul><li><code>op=start</code> - a container has been started. Provides
      the machine name, uuid and PID of the <code>libvirt_lxc</code>
      controller process</li><li><code>op=init</code> - the init PID of the container has been
      started. Provides the machine name, uuid and PID of the
      <code>libvirt_lxc</code> controller process and PID of the
      init process (in the host PID namespace)</li><li><code>op=stop</code> - a container has been stopped. Provides
      the machine name, uuid</li></ul>
</li><li><code>VIRT_RESOURCE</code> - details of a host resource
associated with a container action.</li></ul>
        <h3>Device access</h3>
        <p>
All containers are launched with the CAP_MKNOD capability cleared
and removed from the bounding set. Libvirt will ensure that the
/dev filesystem is pre-populated with all devices that a container
is allowed to use. In addition, the cgroup "device" controller is
configured to block read/write/mknod from all devices except those
that a container is authorized to use.
</p>
        <h2>
          <a name="exconfig" shape="rect" id="exconfig">Example configurations</a>
          <a class="headerlink" href="#exconfig" title="Permalink to this headline"></a>
        </h2>
        <h3>Example config version 1</h3>
        <p></p>
        <pre xml:space="preserve">
&lt;domain type='lxc'&gt;
  &lt;name&gt;vm1&lt;/name&gt;
  &lt;memory&gt;500000&lt;/memory&gt;
  &lt;os&gt;
    &lt;type&gt;exe&lt;/type&gt;
    &lt;init&gt;/bin/sh&lt;/init&gt;
  &lt;/os&gt;
  &lt;vcpu&gt;1&lt;/vcpu&gt;
  &lt;clock offset='utc'/&gt;
  &lt;on_poweroff&gt;destroy&lt;/on_poweroff&gt;
  &lt;on_reboot&gt;restart&lt;/on_reboot&gt;
  &lt;on_crash&gt;destroy&lt;/on_crash&gt;
  &lt;devices&gt;
    &lt;emulator&gt;/usr/libexec/libvirt_lxc&lt;/emulator&gt;
    &lt;interface type='network'&gt;
      &lt;source network='default'/&gt;
    &lt;/interface&gt;
    &lt;console type='pty' /&gt;
  &lt;/devices&gt;
&lt;/domain&gt;
</pre>
        <p>
In the &lt;emulator&gt; element, be sure you specify the correct path
to libvirt_lxc, if it does not live in /usr/libexec on your system.
</p>
        <p>
The next example assumes there is a private root filesystem
(perhaps hand-crafted using busybox, or installed from media,
debootstrap, whatever) under /opt/vm-1-root:
</p>
        <p></p>
        <pre xml:space="preserve">
&lt;domain type='lxc'&gt;
  &lt;name&gt;vm1&lt;/name&gt;
  &lt;memory&gt;32768&lt;/memory&gt;
  &lt;os&gt;
    &lt;type&gt;exe&lt;/type&gt;
    &lt;init&gt;/init&lt;/init&gt;
  &lt;/os&gt;
  &lt;vcpu&gt;1&lt;/vcpu&gt;
  &lt;clock offset='utc'/&gt;
  &lt;on_poweroff&gt;destroy&lt;/on_poweroff&gt;
  &lt;on_reboot&gt;restart&lt;/on_reboot&gt;
  &lt;on_crash&gt;destroy&lt;/on_crash&gt;
  &lt;devices&gt;
    &lt;emulator&gt;/usr/libexec/libvirt_lxc&lt;/emulator&gt;
    &lt;filesystem type='mount'&gt;
      &lt;source dir='/opt/vm-1-root'/&gt;
      &lt;target dir='/'/&gt;
    &lt;/filesystem&gt;
    &lt;interface type='network'&gt;
      &lt;source network='default'/&gt;
    &lt;/interface&gt;
    &lt;console type='pty' /&gt;
  &lt;/devices&gt;
&lt;/domain&gt;
</pre>
        <h2>
          <a name="capabilities" shape="rect" id="capabilities">Altering the available capabilities</a>
          <a class="headerlink" href="#capabilities" title="Permalink to this headline"></a>
        </h2>
        <p>
By default the libvirt LXC driver drops some capabilities among which CAP_MKNOD.
However <span class="since">since 1.2.6</span> libvirt can be told to keep or
drop some capabilities using a domain configuration like the following:
</p>
        <pre xml:space="preserve">
...
&lt;features&gt;
  &lt;capabilities policy='default'&gt;
    &lt;mknod state='on'/&gt;
    &lt;sys_chroot state='off'/&gt;
  &lt;/capabilities&gt;
&lt;/features&gt;
...
</pre>
        <p>
The capabilities children elements are named after the capabilities as defined in
<code>man 7 capabilities</code>. An <code>off</code> state tells libvirt to drop the
capability, while an <code>on</code> state will force to keep the capability even though
this one is dropped by default.
</p>
        <p>
The <code>policy</code> attribute can be one of <code>default</code>, <code>allow</code>
or <code>deny</code>. It defines the default rules for capabilities: either keep the
default behavior that is dropping a few selected capabilities, or keep all capabilities
or drop all capabilities. The interest of <code>allow</code> and <code>deny</code> is that
they guarantee that all capabilities will be kept (or removed) even if new ones are added
later.
</p>
        <p>
The following example, drops all capabilities but CAP_MKNOD:
</p>
        <pre xml:space="preserve">
...
&lt;features&gt;
  &lt;capabilities policy='deny'&gt;
    &lt;mknod state='on'/&gt;
  &lt;/capabilities&gt;
&lt;/features&gt;
...
</pre>
        <p>
Note that allowing capabilities that are normally dropped by default can seriously
affect the security of the container and the host.
</p>
        <h2>
          <a name="share" shape="rect" id="share">Inherit namespaces</a>
          <a class="headerlink" href="#share" title="Permalink to this headline"></a>
        </h2>
        <p>
Libvirt allows you to inherit the namespace from container/process just like lxc tools
or docker provides to share the network namespace. The following can be used to share
required namespaces. If we want to share only one then the other namespaces can be ignored.
The netns option is specific to sharenet. It can be used in cases we want to use existing network namespace
rather than creating new network namespace for the container. In this case privnet option will be
ignored.
</p>
        <pre xml:space="preserve">
&lt;domain type='lxc' xmlns:lxc='http://libvirt.org/schemas/domain/lxc/1.0'&gt;
...
&lt;lxc:namespace&gt;
  &lt;lxc:sharenet type='netns' value='red'/&gt;
  &lt;lxc:shareuts type='name' value='container1'/&gt;
  &lt;lxc:shareipc type='pid' value='12345'/&gt;
&lt;/lxc:namespace&gt;
&lt;/domain&gt;
</pre>
        <h2>
          <a name="usage" shape="rect" id="usage">Container usage / management</a>
          <a class="headerlink" href="#usage" title="Permalink to this headline"></a>
        </h2>
        <p>
As with any libvirt virtualization driver, LXC containers can be
managed via a wide variety of libvirt based tools. At the lowest
level the <code>virsh</code> command can be used to perform many
tasks, by passing the <code>-c lxc:///</code> argument. As an
alternative to repeating the URI with every command, the <code>LIBVIRT_DEFAULT_URI</code>
environment variable can be set to <code>lxc:///</code>. The
examples that follow outline some common operations with virsh
and LXC. For further details about usage of virsh consult its
manual page.
</p>
        <h3>
          <a name="usageSave" shape="rect" id="usageSave">Defining (saving) container configuration</a>
          <a class="headerlink" href="#usageSave" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh define</code> command takes an XML configuration
document and loads it into libvirt, saving the configuration on disk
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// define myguest.xml
</pre>
        <h3>
          <a name="usageView" shape="rect" id="usageView">Viewing container configuration</a>
          <a class="headerlink" href="#usageView" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh dumpxml</code> command can be used to view the
current XML configuration of a container. By default the XML
output reflects the current state of the container. If the
container is running, it is possible to explicitly request the
persistent configuration, instead of the current live configuration
using the <code>--inactive</code> flag
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// dumpxml myguest
</pre>
        <h3>
          <a name="usageStart" shape="rect" id="usageStart">Starting containers</a>
          <a class="headerlink" href="#usageStart" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh start</code> command can be used to start a
container from a previously defined persistent configuration
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// start myguest
</pre>
        <p>
It is also possible to start so called "transient" containers,
which do not require a persistent configuration to be saved
by libvirt, using the <code>virsh create</code> command.
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// create myguest.xml
</pre>
        <h3>
          <a name="usageStop" shape="rect" id="usageStop">Stopping containers</a>
          <a class="headerlink" href="#usageStop" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh shutdown</code> command can be used
to request a graceful shutdown of the container. By default
this command will first attempt to send a message to the
init process via the <code>/dev/initctl</code> device node.
If no such device node exists, then it will send SIGTERM
to PID 1 inside the container.
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// shutdown myguest
</pre>
        <p>
If the container does not respond to the graceful shutdown
request, it can be forcibly stopped using the <code>virsh destroy</code>
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// destroy myguest
</pre>
        <h3>
          <a name="usageReboot" shape="rect" id="usageReboot">Rebooting a container</a>
          <a class="headerlink" href="#usageReboot" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh reboot</code> command can be used
to request a graceful shutdown of the container. By default
this command will first attempt to send a message to the
init process via the <code>/dev/initctl</code> device node.
If no such device node exists, then it will send SIGHUP
to PID 1 inside the container.
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// reboot myguest
</pre>
        <h3>
          <a name="usageDelete" shape="rect" id="usageDelete">Undefining (deleting) a container configuration</a>
          <a class="headerlink" href="#usageDelete" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh undefine</code> command can be used to delete the
persistent configuration of a container. If the guest is currently
running, this will turn it into a "transient" guest.
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// undefine myguest
</pre>
        <h3>
          <a name="usageConnect" shape="rect" id="usageConnect">Connecting to a container console</a>
          <a class="headerlink" href="#usageConnect" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh console</code> command can be used to connect
to the text console associated with a container.
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// console myguest
</pre>
        <p>
If the container has been configured with multiple console devices,
then the <code>--devname</code> argument can be used to choose the
console to connect to.
In LXC, multiple consoles will be named
as 'console0', 'console1', 'console2', etc.
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// console myguest --devname console1
</pre>
        <h3>
          <a name="usageEnter" shape="rect" id="usageEnter">Running commands in a container</a>
          <a class="headerlink" href="#usageEnter" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh lxc-enter-namespace</code> command can be used
to enter the namespaces and security context of a container
and then execute an arbitrary command.
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// lxc-enter-namespace myguest -- /bin/ls -al /dev
</pre>
        <h3>
          <a name="usageTop" shape="rect" id="usageTop">Monitoring container utilization</a>
          <a class="headerlink" href="#usageTop" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virt-top</code> command can be used to monitor the
activity and resource utilization of all containers on a
host
</p>
        <pre xml:space="preserve">
# virt-top -c lxc:///
</pre>
        <h3>
          <a name="usageConvert" shape="rect" id="usageConvert">Converting LXC container configuration</a>
          <a class="headerlink" href="#usageConvert" title="Permalink to this headline"></a>
        </h3>
        <p>
The <code>virsh domxml-from-native</code> command can be used to convert
most of the LXC container configuration into a domain XML fragment
</p>
        <pre xml:space="preserve">
# virsh -c lxc:/// domxml-from-native lxc-tools /var/lib/lxc/myguest/config
</pre>
        <p>
This conversion has some limitations due to the fact that the
domxml-from-native command output has to be independent of the host. Here
are a few things to take care of before converting:
</p>
        <ul><li>
Replace the fstab file referenced by <tt>lxc.mount</tt> by the corresponding
lxc.mount.entry lines.
</li><li>
Replace all relative sizes of tmpfs mount entries to absolute sizes. Also
make sure that tmpfs entries all have a size option (default is 50%).
</li><li>
Define <tt>lxc.cgroup.memory.limit_in_bytes</tt> to properly limit the memory
available to the container. The conversion will use 64MiB as the default.
</li></ul>
      </div>
    </div>
  </body>
</html>