ABSTRACT
Elevated temperatures impact the performance, power consumption, and reliability of processors, which rely on integrated thermal sensors to measure runtime thermal behavior.
1. INTRODUCTION
Dynamic thermal management (DTM) techniques allow processors to optimize performance while avoiding thermal violations. The most well-known DTM techniques include clock gating, dynamic
voltage and frequency scaling (DVFS), and thread migration/scheduling [14, 18, 6, 5].
2. RELATED WORK AND MOTIVATION
3. PROPOSED PHASE-AWARE THERMAL PREDICTION METHODOLOGY
At the highest level, our phase-aware thermal prediction approach
takes raw performance counter data that is periodically measured
for each core during workload operation and translates this data
into a temperature projection for some interval into the future using
the concept of workload phases.
In order to define workload phases and capture temperature dynamics
within them in a computationally efficient manner, we propose
the methodology that is illustrated by Figure 2.
3.1 Offline Thermal Phase Analysis
In order to avoid excessive runtime overhead, global phase analysis
and within-phase temperature modeling are performed offline
using data generated for a set of representative workloads.
(page 3, col 2)
page 4
3.2 Runtime Thermal Prediction and Control
4. EXPERIMENTAL RESULTS
A. Experimental Infrastructure.
(page 5)
5. CONCLUSIONS
Friday, August 20, 2010
Wednesday, August 18, 2010
paper 5.1 Thermal Monitoring of Real Processors: Techniques for Sensor Allocation and Full Characterization
ABSTRACT
1. INTRODUCTION
2. BACKGROUND AND PREVIOUS WORK
3. FREQUENCY DOMAIN TECHNIQUES
4. PROPOSED THERMAL SENSOR ALLOCATION TECHNIQUES
5. PROPOSED FULL RUNTIME THERMAL CHARACTERIZATION TECHNIQUES
A. k-LSE using Pre-determined Thermal Characterization
B. Compressive Sensing
6. EXPERIMENTAL RESULTS
7. CONCLUSIONS
1. INTRODUCTION
2. BACKGROUND AND PREVIOUS WORK
3. FREQUENCY DOMAIN TECHNIQUES
4. PROPOSED THERMAL SENSOR ALLOCATION TECHNIQUES
5. PROPOSED FULL RUNTIME THERMAL CHARACTERIZATION TECHNIQUES
A. k-LSE using Pre-determined Thermal Characterization
B. Compressive Sensing
6. EXPERIMENTAL RESULTS
7. CONCLUSIONS
Tuesday, August 17, 2010
paper 4.4 An Effective GPU Implementation of Breadth-First Search
ABSTRACT
1. INTRODUCTION
2. PREVIOUS APPROACHES
3. OUR GPU SOLUTION
3.1 Overview of CUDA on the Nvidia GTX280
3.2 Hierarchical Queue Management
3.3 Hierarchical Kernel Arrangement
4. EXPERIMENTAL RESULTS
1. INTRODUCTION
2. PREVIOUS APPROACHES
3. OUR GPU SOLUTION
3.1 Overview of CUDA on the Nvidia GTX280
3.2 Hierarchical Queue Management
3.3 Hierarchical Kernel Arrangement
4. EXPERIMENTAL RESULTS
Friday, August 13, 2010
paper 4.3 Timing Analysis of Esterel Programs on General-purpose Multiprocessors Lei
ABSTRACT
1. INTRODUCTION
2. OVERVIEW OF ESTEREL
3. CODE GENERATION
4. TIMING ANALYSIS
4.1 Computing Start Times
4.2 Inter-processor Infeasible Paths
4.3 WCET Calculation of a Basic Block
4.4 WCRT Analysis
5. EXPERIMENTAL RESULTS
1. INTRODUCTION
2. OVERVIEW OF ESTEREL
3. CODE GENERATION
4. TIMING ANALYSIS
4.1 Computing Start Times
4.2 Inter-processor Infeasible Paths
4.3 WCET Calculation of a Basic Block
4.4 WCRT Analysis
5. EXPERIMENTAL RESULTS
Tuesday, August 10, 2010
paper 4.2 A Probabilistic and Energy-Efficient Scheduling Approach for Online Application in Real-Time Systems
ABSTRACT
1. INTRODUCTION
2. PRELIMINARIES
2.1 System and Task Model
2.2 Motivating Example
2.3 General Scheduling Concept
2.4 Expected Energy and Time Demand
3. ENERGY MINIMIZATION PROBLEM
4. SOLUTION
4.1 Relaxed Energy Minimization Problem
4.2 General Energy Minimization Problem
4.3 Implementation
5. EXPERIMENTS
5.1 Experimental Setup
5.2 Results
6. CONCLUSIONS
1. INTRODUCTION
2. PRELIMINARIES
2.1 System and Task Model
2.2 Motivating Example
2.3 General Scheduling Concept
2.4 Expected Energy and Time Demand
3. ENERGY MINIMIZATION PROBLEM
4. SOLUTION
4.1 Relaxed Energy Minimization Problem
4.2 General Energy Minimization Problem
4.3 Implementation
5. EXPERIMENTS
5.1 Experimental Setup
5.2 Results
6. CONCLUSIONS
Friday, August 6, 2010
Paper 4.1 LATA: A Latency and Throughput-Aware Packet Processing System
ABSTRACT
1. INTRODUCTION
2. LATA SYSTEM DESIGN
2.1 Program Representation
2.2 Communication Measurement
2.3 Problem Statement
2.4 DAG Generation
3. LATA SCHEDULING, REFINEMENT AND MAPPING
3.1 List-based Pipeline Scheduling Algorithm
3.2 Search-based Refinement Process
3.2.1 Latency Reduction
3.2.2 Throughput Improvement
3.3 Cache-Aware Resource Mapping
3.3.1 Pre-mapping
3.3.2 Real Mapping
4. EXPERIMENT FRAMEWORK
5. PERFORMANCE EVALUATION
5.1 Comparison with Parallel System
5.2 Comparison with Three NP Systems
5.3 Latency Constraint Effect
5.4 Scalability Performance of LATA
5.5 Instruction Cache Size Performance
1. INTRODUCTION
2. LATA SYSTEM DESIGN
2.1 Program Representation
2.2 Communication Measurement
2.3 Problem Statement
2.4 DAG Generation
3. LATA SCHEDULING, REFINEMENT AND MAPPING
3.1 List-based Pipeline Scheduling Algorithm
3.2 Search-based Refinement Process
3.2.1 Latency Reduction
3.2.2 Throughput Improvement
3.3 Cache-Aware Resource Mapping
3.3.1 Pre-mapping
3.3.2 Real Mapping
4. EXPERIMENT FRAMEWORK
5. PERFORMANCE EVALUATION
5.1 Comparison with Parallel System
5.2 Comparison with Three NP Systems
5.3 Latency Constraint Effect
5.4 Scalability Performance of LATA
5.5 Instruction Cache Size Performance
Tuesday, August 3, 2010
Paper 3.3 Online SystemC Emulation Acceleration
Abstract
1. INTRODUCTION
2. RELATED WORK
3. SYSTEMC IN-SYSTEM EMULATION ARCHITECTURE
3.1 Base Architecture with Acceleration Engines
3.2 Kernel Bypass
4. ONLINE ACCELERATION ASSIGNMENT
4.1 Problem Definition
4.2 Communication Overhead
5. HEURISTICS
5.1 Upper and Lower Bounds
5.2 Accelerator Static Assignment
5.3 Greedy Heuristic
5.4 Aggregate Gain
6. EXPERIMENTS
6.1 Framework
6.2 Evaluation
1. INTRODUCTION
2. RELATED WORK
3. SYSTEMC IN-SYSTEM EMULATION ARCHITECTURE
3.1 Base Architecture with Acceleration Engines
3.2 Kernel Bypass
4. ONLINE ACCELERATION ASSIGNMENT
4.1 Problem Definition
4.2 Communication Overhead
5. HEURISTICS
5.1 Upper and Lower Bounds
5.2 Accelerator Static Assignment
5.3 Greedy Heuristic
5.4 Aggregate Gain
6. EXPERIMENTS
6.1 Framework
6.2 Evaluation
Subscribe to:
Comments (Atom)