Computer and Machine Vision Lecture Week 3

January 27, 2014

 Sam Siewert

Outline of Week 3 Processing Images and Moving Pictures – High Level View and Computer Architecture for it Linux Platforms for Computer/Machine Vision I/O, Memory and Processing Challenges  Sam Siewert

2

Old School Moving Picture Media and Cameras NTSC OTA (1941, 1953 color, 2009 dead) – Analog, Interlaced, Continuous Broadcast Transmission or CCTV (Closed Circuit TV) – Coax Cable or Tuner with Immediate CRT Display – No Buffers, No Routing, No De-mux – No Compression

Analog Cable AM/FM OTA Film Projectors

 Sam Siewert

3

Modern Digital Cameras Camera Link – – – – –

High Frame Rates High Data Rates and Resolutions Industry Standard for Machine Vision Automation E.g. Inspection Systems E.g. – Sony, IDT, National Instruments

SD-SDI and HD-SDI – Standard and High Definition Synchronous Digital Interface – Standard for Studios, Broadcast

Digital Cinema – Red Camera – 1080p, 2K, 4K Resolutions and Much Higher – Automated Digital Delivery and Projection

Webcams and Mobile Phone Cameras – Very Low Cost – Proprietary – Performance Varies Dramatically  Sam Siewert

4

Differences Analog vs Digital Encoding for Transmission – Digital Allows for Image Processing – Adds Latency – Requires Compression for Packet Switched Networks and Storage

Routed (Diversely), Buffered Compressed (MPEG, JPEG) to Lower Bit-rates Multiplexed (Shares Transmission Carrier for Audio, Video, Channels) Transported by IP (Large Packets) Continuous Transmission – Analog or Constant Bit-Rate / Frame-Rate

 Sam Siewert

5

E.g. UAV Latency and Jitter Verification of Video Frame Latency Telemetry for UAV Systems Using A Secondary Optical Method, Sam Siewert, Muhammad Ahmad, Kevin Yao

 Sam Siewert

6

NTSC (Analog TV)

http://en.wikipedia.org/wiki/File:Ntsc_channel.svg

AM Video to CRT FM Audio Chroma Added Later Odd/Even Lines (Interlaced) 29.97 FPS (30 before color) Vertical Blanking (CRT Retrace Time, Closed Captioning) 525 Lines, 262.5 per Field, 60 Fields per Second

 Sam Siewert

7

Linux in Computer Vision Embedded Solutions – Texas Instruments OMAP (Beagle xM, Bone) – Numerous ARM SoCs (NVIDIA, Qualcomm, Broadcomm, …)

Scalable Solutions – Multi-Core (Xeon Phi) – Vector Processing – CUDA, OpenCL GPU and GPGPU

Computer and Machine Vision is I/O, Memory and Processing Intensive  Sam Siewert

8

Camera Interfaces CCD (Charge Coupled Device) or CMOS (Common Metal Oxide Substrate) Detector – Integration Time for Photo-sensitive Elements in Array (to Build up Charge) – Read-out Time to Sample Elements in Array

Luminance and Chroma Analog to Digital Conversion Double Buffer for Read-out + Processing Frame Capture – http://www.cse.uaa.alaska.edu/~ssiewert/a485_doc/FrameCapture-Chips/ – Host Interface over PCI Bus or USB  Sam Siewert

9

Digital Video Transport QoS Latency – To Tune in a Program, Turn-on – To Deliver a Video Frame or Audio PCM Sample – To Start, FF, REW, Start-Over, Pause

Bandwidth – – – – –

Resolution, Lossy/Lossless Compression, High Motion Pixel Encoding for Color Frame Rate Constant Bit-rate Transport? Variable Bit-rate Transport and Encoding?

Jitter – Decode and Presentation Rates – Elasticity in Decode to Presentation Buffering Necessary  Sam Siewert

10

Linux System Options (Linux for Image Processing, Camera Interfacing and Computer Vision)

January 27, 2014

 Sam Siewert

Processing Outline Many-Core Linux Host(s) – Intel Atom – ARM – Xeon

GP-GPU Vector Processing PCI-E Co-Processors NVIDIA Tesla/Fermi AMD ATI

NPTL – Native POSIX Threads Library NPTL Example Code Walkthrough

 Sam Siewert

12

Conceptual View of RT Resources Three-Space View of Utilization Requirements – CPU Margin? – IO Latency (and Bandwidth) Margin? – Memory Capacity (and Latency) Margin?

CPU-Utility

IO-Utility

Memory-Utility

 Sam Siewert

Upper Right Front Corner – Low-Margin Origin – High-Margin Mobile – Must Consider Battery Life Too (Power) 13

Processing – Initial Focus Processing and Scaling Frame Transformation, Encode, Decode is Critical Memory for Buffering (Frame Transformations, CPU Integrated or GPU Offloaded – e.g. Linux VDPAU) I/O for Networking (Transport) I/O for Storage (On-Demand, Post, Non-Linear Editing)

 Sam Siewert

14

Flynn’s Computer Architecture Taxonomy Single Instruction

Multiple Instruction

Single Data

SISD (Traditional Uniprocessor)

MISD (Voting schemes and active-active controllers)

Multiple Data

SIMD (e.g. SSE 4.2, GPGPU, Vector Processing)

MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)

GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload (GP-GPU) NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH) SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling, Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity Possible) MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data  Sam Siewert

15

Computer and Machine Vision Treated as a Real-time and/or Interactive System – Requires Predictable Response (By Deadline) – Rate Monotonic – Earliest Deadline First

 Sam Siewert

16

CPU Scheduling Taxonomy Execution Scheduling Local-Uniprocessor

Global-MP Dynamic Symmetric (SMP OS)

Preemptive

Static

SMT (Micro-Paralell)

Asymmetric Distributed (AMP )

(Preemptive, Non-Preemptive Subtree Under Each Global-MP Leaf)

Hybrid

Non-Preemptive

Fixed-Priority Rate Monotonic

Batch

Deadline Monotonic

FCFS

SJN

Dynamic-Priority Cooperative Dataflow Heuristic

 Sam Siewert

EDF/LLF

RR Timeslice (desktop)

Multi-Frequency Co-Routine Executives

Continuation Function

17

Response Latency Ci WCET Input/Output Latency Interference Time

Response Time = TimeActuation – TimeSensed (From Release to Response)

Event Sensed Interrupt

Dispatch

Preemption

Completion Dispatch (IO Queued)

Actuation (IO Completion)

Interference

Input-Latency

Execution

Dispatch-Latency  Sam Siewert

Time

Execution Output-Latency 18

SIMD Vector Instructions Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intelstreaming-simd-extensions-and-intel-integratedperformance-primitives-to-accelerate-algorithms/

PSF

 Sam Siewert

19

Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units) – Evolved for Consumer CGI and Games Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU

– Higher End Used for Digital Cinema / Post Production, Broadcast PNY Quadro FX NVIDIA CUDA for Post

– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate, etc. - http://www.elementaltechnologies.com/

2. Built-In SIMD Instruction Set Extensions – Intel SSE  Sam Siewert

20

GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn’s Taxonomy SIMD Architecture often leverages GP-GPU CoProcessors or Cell for MPMD Single Instruction/Prog Multiple Instruction Single Data

SISD (Traditional Uniprocessor)

MISD (Voting schemes and active-active controllers)

Multiple Data

SIMD (SSE 4.2, Vector Processing) SPMD (Single Program 21 Multiple Data), GP-GPU

MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)

SSE – Streaming SIMD Extensions 128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition – 4 Single Precision Pixel Multiply and Accumulate in Single Instruction vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w;

movaps xmm0,address-of-v1 addps xmm0,address-of-v2 movaps address-of-vec_res,xmm0

16 operations to load 2 operands, add, store

3 SSE operations to load, add, store

 Sam Siewert

;xmm0=v1.w | v1.z | v1.y | v1.x ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x 22

Scheduling Parallel/Cluster HW MIMD – OS SMP threading, provides load balancing, affinity operations, routable interrupts (e.g. MSI-X), e.g. NPTL – RTOS AMP is most often used in Embedded Systems

MPMD – OpenCL, CUDA, DirectCompute (DirectX extension) – Intel OpenMP, Linux Cluster, MPI  Sam Siewert

23

How Does NPTL Work? No Thread Manager or M-on-N Mapping – – – – –

Previous POSIX Threading Model Manager Becomes Bottleneck Two-Level Scheduling Not Deterministic Many Pthreads (M) to N Kernel Threads Still an Issue O(n) Scheduling for each Manager

Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires Root privilege) – Deterministic (Non-Determinism due to Kernel Preemptability Issues) – O(1) Scheduling

Scheduling Policies Selectable Similar to RTOS Tasking

 Sam Siewert

24

Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive – SCHED_RR – This is Fair, but at Kernel Level – SCHED_OTHER – This is OS default and should not be used

POSIX Threads have – – – – –

Policy (FIFO, RR, OTHER) Priority (RT min to RT max) Creation (Fork) Join (Wait for thread completion at rendezvous) Synchronization Methods Semaphores Message Queues

– Asynchronous Communication Methods Signals Queued Signals

POSIX RT Extensions Include – Virtual Timer Services – Signals Tied to Timer Services – Priority Inversion Protection (Availability on Linux TBD)  Sam Siewert

25

NPTL Coding Code Walk-through

July 7, 2004

 Sam Siewert

Thread Scheduling Policy pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(SCHED_FIFO); rt_min_prio = sched_get_priority_min(SCHED_FIFO); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD SCOPE PROCESS\n"); else printf("PTHREAD SCOPE UNKNOWN\n");

 Sam Siewert

27

Thread Creation and Join rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0); if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");

 Sam Siewert

28

Issues Beyond Policy and Feasibility Throughput Latency How do they Differ? E.g. Frame Rate vs. Time to First Frame

 Sam Siewert

29

Digital Video (Quick Reminders)

 Sam Siewert

30

Simple Encode/Decode is Processing Intensive GPU Co-Processors Can Offload CPU Example with Mplayer and VDPAU (Video Decode and Presentation Acceleration Unit) for Linux Core Loading with Mplayer VDPAU MPEG Decode (Load balancing and offload)

 Sam Siewert

Dual-Core SW Decode (Load balancing)

31

Discussion – What Does Eye See? Ewald Hering (1872), Opponent Colors (R/G, Y/B) Color Models

RGB Cube

– RGB Cube – HSV - Hue/Saturation/Value Hue – Similarity to R, G, Y, B Saturation – Color vs. Brightness Value – Low=Black, High=Color

http://en.wikipedia.org/wiki/File:RGB_Cube_Show_lowgamma_cutout_b.png

– Red and Green Opponent Colors – Can’t See Both Simultaneously – Yellow and Blue Opponent Colors – Luminance (Candela/Square-Meter) – Light Passing Through Area Forming a Solid Angle in A Direction

HSV Cylinder

Candela (Photonic Power )= Watts/Steradian More Precise than “Brightness”

– Chrominance (“CrCb” or “UV” in YCrCb or YUV) U=Blue – Luminance (Y) V=Red - Luminance (Y)

– Wavelength Spectrum - ROYGBIV http://en.wikipedia.org/wiki/File:HSV_color_solid_cylinder_alpha_lowgamma.png

 Sam Siewert

32

Frame Analysis and Image Processing Resources for Raw Frame Data GNU Image Processing – Single Frame Analysis and Transforms

Octave – Similar to MATLAB

Irfanview – Simple Viewer includes PPM OpenCV (C/C++ and Python API) Single Frame Viewing and Analysis – http://www.irfanview.com/ – http://www.gimp.org/downloads/

Image Processing Libraries – http://cimg.sourceforge.net/ – http://opencv.org/

 Sam Siewert

33

Practice with Linux GIMP PPM and JPEG Frame Analysis FFMPEG MPEG-4 DV to Frames Sobel Image Transformation Real-Time – http://www.cse.uaa.alaska.edu/~ssiewert/a485_code/capturetransformer/

Sobel Image Transformation Batch Mode FFMPEG Re-encoding

 Sam Siewert

34