Computer and Machine Vision Lecture Week 3
January 27, 2014
Sam Siewert
Outline of Week 3 Processing Images and Moving Pictures – High Level View and Computer Architecture for it Linux Platforms for Computer/Machine Vision I/O, Memory and Processing Challenges Sam Siewert
2
Old School Moving Picture Media and Cameras NTSC OTA (1941, 1953 color, 2009 dead) – Analog, Interlaced, Continuous Broadcast Transmission or CCTV (Closed Circuit TV) – Coax Cable or Tuner with Immediate CRT Display – No Buffers, No Routing, No De-mux – No Compression
Analog Cable AM/FM OTA Film Projectors
Sam Siewert
3
Modern Digital Cameras Camera Link – – – – –
High Frame Rates High Data Rates and Resolutions Industry Standard for Machine Vision Automation E.g. Inspection Systems E.g. – Sony, IDT, National Instruments
SD-SDI and HD-SDI – Standard and High Definition Synchronous Digital Interface – Standard for Studios, Broadcast
Digital Cinema – Red Camera – 1080p, 2K, 4K Resolutions and Much Higher – Automated Digital Delivery and Projection
Webcams and Mobile Phone Cameras – Very Low Cost – Proprietary – Performance Varies Dramatically Sam Siewert
4
Differences Analog vs Digital Encoding for Transmission – Digital Allows for Image Processing – Adds Latency – Requires Compression for Packet Switched Networks and Storage
Routed (Diversely), Buffered Compressed (MPEG, JPEG) to Lower Bit-rates Multiplexed (Shares Transmission Carrier for Audio, Video, Channels) Transported by IP (Large Packets) Continuous Transmission – Analog or Constant Bit-Rate / Frame-Rate
Sam Siewert
5
E.g. UAV Latency and Jitter Verification of Video Frame Latency Telemetry for UAV Systems Using A Secondary Optical Method, Sam Siewert, Muhammad Ahmad, Kevin Yao
Sam Siewert
6
NTSC (Analog TV)
http://en.wikipedia.org/wiki/File:Ntsc_channel.svg
AM Video to CRT FM Audio Chroma Added Later Odd/Even Lines (Interlaced) 29.97 FPS (30 before color) Vertical Blanking (CRT Retrace Time, Closed Captioning) 525 Lines, 262.5 per Field, 60 Fields per Second
Sam Siewert
7
Linux in Computer Vision Embedded Solutions – Texas Instruments OMAP (Beagle xM, Bone) – Numerous ARM SoCs (NVIDIA, Qualcomm, Broadcomm, …)
Scalable Solutions – Multi-Core (Xeon Phi) – Vector Processing – CUDA, OpenCL GPU and GPGPU
Computer and Machine Vision is I/O, Memory and Processing Intensive Sam Siewert
8
Camera Interfaces CCD (Charge Coupled Device) or CMOS (Common Metal Oxide Substrate) Detector – Integration Time for Photo-sensitive Elements in Array (to Build up Charge) – Read-out Time to Sample Elements in Array
Luminance and Chroma Analog to Digital Conversion Double Buffer for Read-out + Processing Frame Capture – http://www.cse.uaa.alaska.edu/~ssiewert/a485_doc/FrameCapture-Chips/ – Host Interface over PCI Bus or USB Sam Siewert
9
Digital Video Transport QoS Latency – To Tune in a Program, Turn-on – To Deliver a Video Frame or Audio PCM Sample – To Start, FF, REW, Start-Over, Pause
Bandwidth – – – – –
Resolution, Lossy/Lossless Compression, High Motion Pixel Encoding for Color Frame Rate Constant Bit-rate Transport? Variable Bit-rate Transport and Encoding?
Jitter – Decode and Presentation Rates – Elasticity in Decode to Presentation Buffering Necessary Sam Siewert
10
Linux System Options (Linux for Image Processing, Camera Interfacing and Computer Vision)
January 27, 2014
Sam Siewert
Processing Outline Many-Core Linux Host(s) – Intel Atom – ARM – Xeon
GP-GPU Vector Processing PCI-E Co-Processors NVIDIA Tesla/Fermi AMD ATI
NPTL – Native POSIX Threads Library NPTL Example Code Walkthrough
Sam Siewert
12
Conceptual View of RT Resources Three-Space View of Utilization Requirements – CPU Margin? – IO Latency (and Bandwidth) Margin? – Memory Capacity (and Latency) Margin?
CPU-Utility
IO-Utility
Memory-Utility
Sam Siewert
Upper Right Front Corner – Low-Margin Origin – High-Margin Mobile – Must Consider Battery Life Too (Power) 13
Processing – Initial Focus Processing and Scaling Frame Transformation, Encode, Decode is Critical Memory for Buffering (Frame Transformations, CPU Integrated or GPU Offloaded – e.g. Linux VDPAU) I/O for Networking (Transport) I/O for Storage (On-Demand, Post, Non-Linear Editing)
Sam Siewert
14
Flynn’s Computer Architecture Taxonomy Single Instruction
Multiple Instruction
Single Data
SISD (Traditional Uniprocessor)
MISD (Voting schemes and active-active controllers)
Multiple Data
SIMD (e.g. SSE 4.2, GPGPU, Vector Processing)
MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)
GPC has gone MIMD with SIMD Instruction Sets and SIMD Offload (GP-GPU) NUMA vs. UMA (Trend away from UMA to NUMA or MCH vs. IOH) SMP with One OS (Shared Memory, CPU-balanced Interrupt Handling, Process Load Balancing, Mutli-User, Multi-Application, CPU Affinity Possible) MIMD - Single Program Multi-Data vs. Multi-Program Multi-Data Sam Siewert
15
Computer and Machine Vision Treated as a Real-time and/or Interactive System – Requires Predictable Response (By Deadline) – Rate Monotonic – Earliest Deadline First
Sam Siewert
16
CPU Scheduling Taxonomy Execution Scheduling Local-Uniprocessor
Global-MP Dynamic Symmetric (SMP OS)
Preemptive
Static
SMT (Micro-Paralell)
Asymmetric Distributed (AMP )
(Preemptive, Non-Preemptive Subtree Under Each Global-MP Leaf)
Hybrid
Non-Preemptive
Fixed-Priority Rate Monotonic
Batch
Deadline Monotonic
FCFS
SJN
Dynamic-Priority Cooperative Dataflow Heuristic
Sam Siewert
EDF/LLF
RR Timeslice (desktop)
Multi-Frequency Co-Routine Executives
Continuation Function
17
Response Latency Ci WCET Input/Output Latency Interference Time
Response Time = TimeActuation – TimeSensed (From Release to Response)
Event Sensed Interrupt
Dispatch
Preemption
Completion Dispatch (IO Queued)
Actuation (IO Completion)
Interference
Input-Latency
Execution
Dispatch-Latency Sam Siewert
Time
Execution Output-Latency 18
SIMD Vector Instructions Intel MMX, SSE 1, 2, 3, 4.x Code Generation Using SIMD Extensions to Accelerate Algorithms (Edge Enhancement) – http://software.intel.com/en-us/articles/using-intelstreaming-simd-extensions-and-intel-integratedperformance-primitives-to-accelerate-algorithms/
PSF
Sam Siewert
19
Offload, Co-Proc, Vector Proc 1. GPU (Graphics Processing Units) – Evolved for Consumer CGI and Games Physics Engines 3D Rendering + Texture (4D Vector Operations) Game Engines and Simulation HD Output: HDMI, HD-SDI, Headless GP-GPU
– Higher End Used for Digital Cinema / Post Production, Broadcast PNY Quadro FX NVIDIA CUDA for Post
– GP-GPU Being Used to Accelerate Encode, Transcode, Trans-rate, etc. - http://www.elementaltechnologies.com/
2. Built-In SIMD Instruction Set Extensions – Intel SSE Sam Siewert
20
GP-GPU, What Is It? Ideal for Large Bitwise, Integer, and Floating Point Vector Math Flynn’s Taxonomy SIMD Architecture often leverages GP-GPU CoProcessors or Cell for MPMD Single Instruction/Prog Multiple Instruction Single Data
SISD (Traditional Uniprocessor)
MISD (Voting schemes and active-active controllers)
Multiple Data
SIMD (SSE 4.2, Vector Processing) SPMD (Single Program 21 Multiple Data), GP-GPU
MIMD (Distributed systems (MPMD), Clusters with MPI/PVM (SPMD), AMP/SMP)
SSE – Streaming SIMD Extensions 128-bit registers known as XMM0 through XMM7 Large Operands and Operators (Multi-Word) E.g. 128-bit XOR of Two Operands Multiple Multiply and Accumulate Operations for Floating Point (DSP Kernel Operations) – E.g. 4 Component Vector addition – 4 Single Precision Pixel Multiply and Accumulate in Single Instruction vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w;
movaps xmm0,address-of-v1 addps xmm0,address-of-v2 movaps address-of-vec_res,xmm0
16 operations to load 2 operands, add, store
3 SSE operations to load, add, store
Sam Siewert
;xmm0=v1.w | v1.z | v1.y | v1.x ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x 22
Scheduling Parallel/Cluster HW MIMD – OS SMP threading, provides load balancing, affinity operations, routable interrupts (e.g. MSI-X), e.g. NPTL – RTOS AMP is most often used in Embedded Systems
MPMD – OpenCL, CUDA, DirectCompute (DirectX extension) – Intel OpenMP, Linux Cluster, MPI Sam Siewert
23
How Does NPTL Work? No Thread Manager or M-on-N Mapping – – – – –
Previous POSIX Threading Model Manager Becomes Bottleneck Two-Level Scheduling Not Deterministic Many Pthreads (M) to N Kernel Threads Still an Issue O(n) Scheduling for each Manager
Direct Mapping of User to Kernel Thread or 1-to-1 – User Space Pthread Maps Directly onto Kernel Thread (Requires Root privilege) – Deterministic (Non-Determinism due to Kernel Preemptability Issues) – O(1) Scheduling
Scheduling Policies Selectable Similar to RTOS Tasking
Sam Siewert
24
Linux NPTL Scheduling Policies Fixed Priority Preemptive – SCHED_FIFO – This is Priority Preemptive – SCHED_RR – This is Fair, but at Kernel Level – SCHED_OTHER – This is OS default and should not be used
POSIX Threads have – – – – –
Policy (FIFO, RR, OTHER) Priority (RT min to RT max) Creation (Fork) Join (Wait for thread completion at rendezvous) Synchronization Methods Semaphores Message Queues
– Asynchronous Communication Methods Signals Queued Signals
POSIX RT Extensions Include – Virtual Timer Services – Signals Tied to Timer Services – Priority Inversion Protection (Availability on Linux TBD) Sam Siewert
25
NPTL Coding Code Walk-through
July 7, 2004
Sam Siewert
Thread Scheduling Policy pthread_attr_init(&rt_sched_attr); pthread_attr_setinheritsched(&rt_sched_attr, PTHREAD_EXPLICIT_SCHED); pthread_attr_setschedpolicy(&rt_sched_attr, SCHED_FIFO); rt_max_prio = sched_get_priority_max(SCHED_FIFO); rt_min_prio = sched_get_priority_min(SCHED_FIFO); rt_param.sched_priority = rt_max_prio-1; rc=sched_setscheduler(getpid(), SCHED_FIFO, &rt_param); pthread_attr_getscope(&rt_sched_attr, &scope); if(scope == PTHREAD_SCOPE_SYSTEM) printf("PTHREAD SCOPE SYSTEM\n"); else if (scope == PTHREAD_SCOPE_PROCESS) printf("PTHREAD SCOPE PROCESS\n"); else printf("PTHREAD SCOPE UNKNOWN\n");
Sam Siewert
27
Thread Creation and Join rc = pthread_create(&main_thread, &main_sched_attr, testThread, (void *)0); if (rc) { printf("ERROR; pthread_create() rc is %d\n", rc); perror(NULL); exit(-1); } pthread_join(main_thread, NULL); if(pthread_attr_destroy(&rt_sched_attr) != 0) perror("attr destroy");
Sam Siewert
28
Issues Beyond Policy and Feasibility Throughput Latency How do they Differ? E.g. Frame Rate vs. Time to First Frame
Sam Siewert
29
Digital Video (Quick Reminders)
Sam Siewert
30
Simple Encode/Decode is Processing Intensive GPU Co-Processors Can Offload CPU Example with Mplayer and VDPAU (Video Decode and Presentation Acceleration Unit) for Linux Core Loading with Mplayer VDPAU MPEG Decode (Load balancing and offload)
Sam Siewert
Dual-Core SW Decode (Load balancing)
31
Discussion – What Does Eye See? Ewald Hering (1872), Opponent Colors (R/G, Y/B) Color Models
RGB Cube
– RGB Cube – HSV - Hue/Saturation/Value Hue – Similarity to R, G, Y, B Saturation – Color vs. Brightness Value – Low=Black, High=Color
http://en.wikipedia.org/wiki/File:RGB_Cube_Show_lowgamma_cutout_b.png
– Red and Green Opponent Colors – Can’t See Both Simultaneously – Yellow and Blue Opponent Colors – Luminance (Candela/Square-Meter) – Light Passing Through Area Forming a Solid Angle in A Direction
HSV Cylinder
Candela (Photonic Power )= Watts/Steradian More Precise than “Brightness”
– Chrominance (“CrCb” or “UV” in YCrCb or YUV) U=Blue – Luminance (Y) V=Red - Luminance (Y)
– Wavelength Spectrum - ROYGBIV http://en.wikipedia.org/wiki/File:HSV_color_solid_cylinder_alpha_lowgamma.png
Sam Siewert
32
Frame Analysis and Image Processing Resources for Raw Frame Data GNU Image Processing – Single Frame Analysis and Transforms
Octave – Similar to MATLAB
Irfanview – Simple Viewer includes PPM OpenCV (C/C++ and Python API) Single Frame Viewing and Analysis – http://www.irfanview.com/ – http://www.gimp.org/downloads/
Image Processing Libraries – http://cimg.sourceforge.net/ – http://opencv.org/
Sam Siewert
33
Practice with Linux GIMP PPM and JPEG Frame Analysis FFMPEG MPEG-4 DV to Frames Sobel Image Transformation Real-Time – http://www.cse.uaa.alaska.edu/~ssiewert/a485_code/capturetransformer/
Sobel Image Transformation Batch Mode FFMPEG Re-encoding
Sam Siewert
34