Browsers are
difficult
Please wait, loading your map...
first to solve it
System calls and process communicate as assembly
if process in user space we can manipulate it with
hard links are limited
-
Operating Systems
Mostly based on xv6
-
How does the OS
engage with external
(to the CPU) devices?
How does the OS use
(persistent) Storage?
file abstraction
file system
implementation
Overall Organization
pg.464
Directory Organization
file organization in memory
(on-disk organization of the data struc- tures
of the vsfs file system) pg. 463
Innodes
allocation structures
superblock
4kb blocks
how to assemble a full
directory tree ?
first making file
systems
mounting it to make
visible
mount()
pg.456
mkfs
pg. 456
Links
Hard Links
entry in the file system
tree, through a system
call known as
link() pg.452
soft links
Free Space
Management
bitmaps
pg.470
how to assemble a full
directory tree from
many underlying file
systems?
mkfs pg.456
File System Interface
pg.443
files
How do we create files?
int fd = open("foo", O_CREAT | O_WRONLY | O_TRUNC);
pg. 443
returns: a file descriptor
once you have such an
object, you can call
other “methods” to
access the file, like
read() and write()
create() pg.443
How do we read write
files?
read()
write()
Reading And Writing,
But Not Sequentially
lseek() system call pg.446
Writing Immediately
fsync()
Renaming Files pg. 448
Removing Files
unlink() pg.450
why unlink? pg 452
or see link Node
dir
Making Directories
mkdir()
39.11 Reading
Directories
opendir() pg.451
readdir()
closedir()
deleting directories
rmdir() pg. 452
The Crash Consistency
Problem
How do we avoid
ruining files during
crash?
Solution #1: The File
System Checker
FSCK
pg. 495
problems
too slow
can't fix case where the
file system looks
consistent but the
inode points to garbage
data
Security issues: a block
could migrate from the
password file to some
other random file.
Solution #2: Journaling
pg. 491
(based on write-ahead
logging)
Step1: Journal write
pg.501
Step2: Journal commit:
Step3: Checkpoint
Step4: Free
pg. 503
How does Journaling
Recover?
if crash before step2
easy: the pending
update is simply
skipped
pg.501
the crash happens after
Step2 but before Step3
redo logging
pg.501
any point during check-
pointing
no problem
problems
we are writing each
data block to the disk
twice,
metadata Journaling
pg.504
add new step between 1
and 2 to write metadata
this is most popular
appraoch
Other solutions?
copy-on-write
backpointer-based
consistency
optimistic crash
consistency
external storage
Hard Disk
magnetic tapes
flash storage
Flash-based SSDs
Flash drives
how do we evaluate
external storage drives?
I/O Time
pg.408
transfer time
rotation time
seek time
“AVERAGE” SEEK time
pg.411
cost and other
engineering factors
SSD vs Hard drive in
depth comparison
HOW TO MAKE A LARGE,
FAST, RELIABLE DISK?
Redundant Array of
Inexpensive Disks
(RAID) pg. 421
How is the address
space of a modern disk
organized?
drive consists of a large
number of sectors (512-
byte blocks), each of
which can be read or
written.
how to communicate
with softwere?
Persistent devices : I/o
HOW TO BUILD I/O
DEVICE-NEUTRAL OS
hard disk drivers pg.403
HOW TO STORE AND
ACCESS DATA ON DISK?
A Simple Disk Drive
pg.404
Disk Scheduling pg.412
understand disk
performance pg 409
Reading A File From
Disk
Writing to Disk pg.472
(software) device driver
pg.396
IO BUS
PCI
HOW TO COMMUNICATE
WITH DEVICES
I/O instructions
pg.395
Memory mapped I/O
pg.395
prerifial IO
Keyboard
USB
mice
How do we know when
asynchronous I/O
completes?
Polling
ask the device every time
HOW TO AVOID THE
COSTS OF POLLING?
Interrupts
pg392
but interrupts is not
always better
Maskable interrupts
Nonmaskable
interrupts
HOW TO LOWER PIO
OVERHEADS
transfer a large chunk
of data to a device is
wasted CPU time
Direct Memory Access (DMA)
A DMA engine is essentially a
very specific device within a
system that can orchestrate
transfers between devices
and main memory without
much CPU intervention.
pg.394
How does the OS use
the CPU in xV6?
The OS has the illusion
of many many CPUs but
in hardware we only
have few CPU
how do we interact with
the many many virtual
CPUs?
The (Linux) Kernel API
Accessing hardware resources
kernel internal API
Kernel I/O Subsystem
How does the processor
give commands and
data to a controller to
accomplish an I/O
transfer?
I/O instructions - port
mapped I/O
data-inregister
data-outregister
status register
control register
device-control registers
are mapped into the
address space of the
processor.
Memory-Mapped I/O
large data trasnfer?
Direct Memory Access
(DMA)
direct virtual memory
access (DVMA)
Nonblocking and
Asynchronous I/O
every time we want to do
something with our virtual
memory and CPU. We use
the kernel API
We create a process
How does the linux
kernel communicate
with a process?
UNIX SIGNALS
how do we isolate
different processes
from one another?
Process Memory Layout
Namespaces
mnt (mount points, fiesystems)
namespace
process ID (PID)
namespace
net (netwoek stack)
namespace
interprocess
communication
(System V IPC)
namespace
A UNIX Time‑Sharing
(UTS) (hostname)
name space system
calls
create
clone()
new process and
namespace
unsure()
creates new namespace
termination
exit()
fork()
join existing namespace
sets()
user
namespace
kernel-userspace API
System Interphase
POSIX API for POSIX-
based systems
(including virtually
allversions of UNIX,
Linux, and Mac OS X)
the C standard library
standard wrapper to access
systems interphase
The Process API
wrapper
wait()
fork()
excec()
excec()
(UNIX) shell
Interacting with he
computer writing a C
program
(shell is just a user
program)
Other system calls not
part of standard library
Process control
File management
Device management
Information
maintenance
Communication
Protection
threads
thread creation
pthread create() pg.280
int pthread_join
Thread Completion
pthread join()
Thread lcoks
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);
pg.285
HOW TO PROVIDE
SUPPORT FOR
SYNCHRONIZATION
with the virtual many
many CPUs?
Thread API
concepts
multithreading
concurrency
Asynchrony
Lock
a lock or mutex (from
mutual exclusion) is a
synchronization
primitive: a mechanism
that enforces limits on
access to a resource
when there are many
threads of execution.
HOW TO BUILD A LOCK
criterial
Controlling Interrupts
Test And Set (Atomic
Exchange)
How does the OS use
memory in xV6?
Memory abstraction on
the OS
how do we organize
memory for each
process?
Every process has a
stack separated into 2
parts
The kernel space
(TOP OF STACK)
which is the location
where the code of the
kernel is stored, and
executes under.
The user space
set of locations where
normal user processes
run (i.e everything
other than the kernel).
The role of the kernel is
to manage applications
running in this space
from messing with each
other, and the machine.
How do we implement
Multilevel page tables
in xv6
mmap()
malloc, free
How do prevent corrupt
memory during
crashes?
shadow paging
similar to journaling
see journaling node
How do we write code
for each process?
base and bounds
segmentation
(generalized base and
bounds)
paging
(TLB) Multilevel page table
TBL
a process will never
accidentally encounter
the wrong trans- lations
in the TLB
HOW TO MANAGE TLB
CONTENTS ON A
CONTEXT SWITCH
flush the TLB on context
switches
pg.191
we are installing a new
entry in the TLB, we
have to replace an old
one, and thus the
question: which one to
replace?
HOW TO DESIGN TLB
REPLACEMENT POLICY
least-recently-used
pg. 192
the page data structure
problems
The Hardware
the CPU
Heterogeneous
Processors
How do we build
heterogeneous
computer?
Heliou
A kinda Multikernel
Limitarions
limited set of
applications. Difficult to
implement satellite
kernels
need new compiler
support for
newplatforms
what does it provide?
Simplify app
development,
deployment, and tuning
Provide single
programming model for
heterogeneous systems
How does it work?
Satellite kernels: Same
OS abstraction
everywhere
Remote message
passing: Transparent
IPC between kernels
Affinity Metrics: Easily
express arbitrary
placement policies to
OS
positive affinity
processes should be
colocated in that stack
negative affinity
processes should be
on different kernels
self-reference affinity
represents a copies of
the process measure
2-phase compilation:
Run apps on arbitrary
devices
priority algorithm
makes decisions
built on Singularity OS
single address space!
Same ISA but different
extensions or micro-
architecture: ARM
big.LITTLE, Xeon Phi,
Intel Sunny Cove
Re-configurable FPGAs
Different ISA on same
chip: AMD integrating $
\times 86$ and ARM
Accelerators such as
GPUs and TPUs
RAM
assembly code concepts
assembly code -
compilers
you still neeed to
convert assembly code
to machine code
How do we actually use
registers in modern
computers?
Compiler takes care of
registers for you
x86-64 registers
virtual machine
I want the benefits of
VM but its too large and
slow!
containers
not full VM but kinda
implementatin
Docker
How do we specify
instructions for set up?
Dockerfile
where do I find images
ready to go for specific
applications ?
docket hub of useful
dozer images
How does it work?
kernel namespaces
A lightweight way to
virtualize a process
see namespace node
CGrounps
control groups
what does CGrouops
provide?
Resource limits
Accounting
Control
Prioritization
How is CGroups
implemented
few kernel additions
none critically impact
performance
A new file system of
type "cgroup" (VFS)
Systernwide: /proc/
cgroups
For each process: /proc/
pid/cgroup
UnionFS
what does UnionFS
provide?
several containers can
share common data
Writes to one container
does not affect another
On write the UnionFS
the overwrite data is
saved to a new path
specific to container
Mange multiple
containers
Deploy containers in
cluster
kubberntes
what are the benefits of
Kubberners?
Service discovery and
load balancing
Storage orchestration
Kubernetes
Automated rollouts and
rollbacks
Automatic bin packing
Self-healing
Secret and
configuration
management
Docker Swarm
VM vs Containers
why
MULTIPLEXING AND
EMULATION
Popek and Goldberg formalized the relationship
between a virtual machine and hypervisor
(which they call VMM)
Virtualization in
Computer Architecture
(emphasis is on
resource allocation)
Bare-metal Hypervisor
(type-1)
How do we handle I/O
Direct access
a virtual machine with
dedicated physical I/O
device can access entire
physical memory using
DMA operations. This
vulnerability issue can
be protected by IOMMU
IOMMU
SR-IOV (Single Root Input Output
Virtualization)
widely used today to
achieve low latency
networking.
DPDK
Xen
trap and emmulate
paravirtualization
Microsoft Hyper-V
x86 was not
virtualizable
Intel® Virtualization
Technology (VT-x)
what hardware
modifications are
required?
root mode
VMware
ESX Server
how do we handle
memory
Virtualization within
Operating Systems
(emphasis is on
resource allocation)
Hosted Hypervisor
(type-2)
how do we handle
memory
Virtual Box *host OS has
no idea about VMM it’s
just another application
QEMU/LinuxKVM
(kernel virtualization
modele) -- full system
simulator
Binary transaltion
VMM workstation
(Year 2000)
binary translation
How do we handle I/O
interposition
paravirtualzation
x86 not virtualizable by
Popek and Goldberg
conditions
OS Architectures
Unikernel
implementatino
problem
How should we
structure an OS for
future multicore
systems?
Solution
structure the OS as a
distributed system
Multikernel
implementatino
Barrelfish project
support x86-64
multiprocessor
will support ARM soon
open sourced
problem
How should we
structure an OS for
future multicore
systems?
scalability to many
cores
current day core inteconnectivity restricts to
neighboring core comunicatino
heterogeneity and
hardware diversity
we have specialized
chips for specialized
funtion. But these dont
communicate so well
Solution
structure the OS as a
distributed system
explicit inter-core
communcation
decouple system
structure from inter-
core communication
naturally supports
heterogenous cores,
non-coherent
interconnects (PCLe)
Intel 80-core
see Hardware:
heterogeneous node
all communication with
messeges (no shared
state)
Tile64
make OS structure
hardware-neutral
view state as replicated
naturally supports
domains with no shared
memory
naturally supports
changes to running
cores
Exokernel
implementatino
problem
monolithic kernel is
general porppose and
not optimized for
applications
monolithic kernel is
runs device drivers at
same privilege as rest
of OS
device drivers written
by third party and are
buggy
bug in device driver
brings down entire
kernel
Solution
applications to manage
physical resources.
Separate policy from
mechanism. Kernel only
provides safety and
mechanism to safely
manage resources
expose allocation
ensure protection
secure bindings
expose names
track ownership
reosources
expose revocation
revoking access to
resources
abort protocols
visible resource
revocation
how does the
application manage
resources?
packet filters written by
application using
primitives
Monolithic Kernel
×
Created using
MindMup.com