ABSTRACT
Many DTM schemes rely heavily on the accurate knowledge
of the chip’s dynamic thermal state to make optimal performance/
temperature trade-off decisions.
1. INTRODUCTION
Most of today’s high performance multi-core processors
suffer from the heavy power/thermal stress
(page 1 col 2)
If the statistical characteristics of the power dissipation
profile does not change in time, Kalman filter based approach
can generate optimal thermal estimates using sensor observations
[4].
In this paper, we investigate the problem of
adaptive temperature tracking at runtime by considering the
dynamic changes in the statistical characteristics of the power
profile.
2. PRELIMINARY
2.1 System Dynamics
2.2 Kalman Filter Based Thermal Tracking
3. PROBLEM DEFINITION AND
CHALLENGES
The statistical characteristics of each potential power state
could be captured by simulating or experimenting with all
potential applications sets (integer vs floating point, scientific
vs multimedia and etc.)
4. ADAPTIVE TRACKING BASED ON
RESIDUAL WHITENING
4.1 Autonomous Detection
In this section we explain how Kalman filter can be used
to autonomously detect the switch of the power states.
Note that we use Cs instead of C[n|n−1] since we assume
the system has reached the steady state.
Once again we use steady state Ks as parameter.
(p4)
Basically, we evaluate error between observation and prediction
and estimate the autocorrelation.
DAC 2010
Wednesday, September 1, 2010
Friday, August 20, 2010
paper 5.2 Consistent Runtime Thermal Prediction and Control Through Workload Phase Detection
ABSTRACT
Elevated temperatures impact the performance, power consumption, and reliability of processors, which rely on integrated thermal sensors to measure runtime thermal behavior.
1. INTRODUCTION
Dynamic thermal management (DTM) techniques allow processors to optimize performance while avoiding thermal violations. The most well-known DTM techniques include clock gating, dynamic
voltage and frequency scaling (DVFS), and thread migration/scheduling [14, 18, 6, 5].
2. RELATED WORK AND MOTIVATION
3. PROPOSED PHASE-AWARE THERMAL PREDICTION METHODOLOGY
At the highest level, our phase-aware thermal prediction approach
takes raw performance counter data that is periodically measured
for each core during workload operation and translates this data
into a temperature projection for some interval into the future using
the concept of workload phases.
In order to define workload phases and capture temperature dynamics
within them in a computationally efficient manner, we propose
the methodology that is illustrated by Figure 2.
3.1 Offline Thermal Phase Analysis
In order to avoid excessive runtime overhead, global phase analysis
and within-phase temperature modeling are performed offline
using data generated for a set of representative workloads.
(page 3, col 2)
page 4
3.2 Runtime Thermal Prediction and Control
4. EXPERIMENTAL RESULTS
A. Experimental Infrastructure.
(page 5)
5. CONCLUSIONS
Elevated temperatures impact the performance, power consumption, and reliability of processors, which rely on integrated thermal sensors to measure runtime thermal behavior.
1. INTRODUCTION
Dynamic thermal management (DTM) techniques allow processors to optimize performance while avoiding thermal violations. The most well-known DTM techniques include clock gating, dynamic
voltage and frequency scaling (DVFS), and thread migration/scheduling [14, 18, 6, 5].
2. RELATED WORK AND MOTIVATION
3. PROPOSED PHASE-AWARE THERMAL PREDICTION METHODOLOGY
At the highest level, our phase-aware thermal prediction approach
takes raw performance counter data that is periodically measured
for each core during workload operation and translates this data
into a temperature projection for some interval into the future using
the concept of workload phases.
In order to define workload phases and capture temperature dynamics
within them in a computationally efficient manner, we propose
the methodology that is illustrated by Figure 2.
3.1 Offline Thermal Phase Analysis
In order to avoid excessive runtime overhead, global phase analysis
and within-phase temperature modeling are performed offline
using data generated for a set of representative workloads.
(page 3, col 2)
page 4
3.2 Runtime Thermal Prediction and Control
4. EXPERIMENTAL RESULTS
A. Experimental Infrastructure.
(page 5)
5. CONCLUSIONS
Wednesday, August 18, 2010
paper 5.1 Thermal Monitoring of Real Processors: Techniques for Sensor Allocation and Full Characterization
ABSTRACT
1. INTRODUCTION
2. BACKGROUND AND PREVIOUS WORK
3. FREQUENCY DOMAIN TECHNIQUES
4. PROPOSED THERMAL SENSOR ALLOCATION TECHNIQUES
5. PROPOSED FULL RUNTIME THERMAL CHARACTERIZATION TECHNIQUES
A. k-LSE using Pre-determined Thermal Characterization
B. Compressive Sensing
6. EXPERIMENTAL RESULTS
7. CONCLUSIONS
1. INTRODUCTION
2. BACKGROUND AND PREVIOUS WORK
3. FREQUENCY DOMAIN TECHNIQUES
4. PROPOSED THERMAL SENSOR ALLOCATION TECHNIQUES
5. PROPOSED FULL RUNTIME THERMAL CHARACTERIZATION TECHNIQUES
A. k-LSE using Pre-determined Thermal Characterization
B. Compressive Sensing
6. EXPERIMENTAL RESULTS
7. CONCLUSIONS
Tuesday, August 17, 2010
paper 4.4 An Effective GPU Implementation of Breadth-First Search
ABSTRACT
1. INTRODUCTION
2. PREVIOUS APPROACHES
3. OUR GPU SOLUTION
3.1 Overview of CUDA on the Nvidia GTX280
3.2 Hierarchical Queue Management
3.3 Hierarchical Kernel Arrangement
4. EXPERIMENTAL RESULTS
1. INTRODUCTION
2. PREVIOUS APPROACHES
3. OUR GPU SOLUTION
3.1 Overview of CUDA on the Nvidia GTX280
3.2 Hierarchical Queue Management
3.3 Hierarchical Kernel Arrangement
4. EXPERIMENTAL RESULTS
Friday, August 13, 2010
paper 4.3 Timing Analysis of Esterel Programs on General-purpose Multiprocessors Lei
ABSTRACT
1. INTRODUCTION
2. OVERVIEW OF ESTEREL
3. CODE GENERATION
4. TIMING ANALYSIS
4.1 Computing Start Times
4.2 Inter-processor Infeasible Paths
4.3 WCET Calculation of a Basic Block
4.4 WCRT Analysis
5. EXPERIMENTAL RESULTS
1. INTRODUCTION
2. OVERVIEW OF ESTEREL
3. CODE GENERATION
4. TIMING ANALYSIS
4.1 Computing Start Times
4.2 Inter-processor Infeasible Paths
4.3 WCET Calculation of a Basic Block
4.4 WCRT Analysis
5. EXPERIMENTAL RESULTS
Tuesday, August 10, 2010
paper 4.2 A Probabilistic and Energy-Efficient Scheduling Approach for Online Application in Real-Time Systems
ABSTRACT
1. INTRODUCTION
2. PRELIMINARIES
2.1 System and Task Model
2.2 Motivating Example
2.3 General Scheduling Concept
2.4 Expected Energy and Time Demand
3. ENERGY MINIMIZATION PROBLEM
4. SOLUTION
4.1 Relaxed Energy Minimization Problem
4.2 General Energy Minimization Problem
4.3 Implementation
5. EXPERIMENTS
5.1 Experimental Setup
5.2 Results
6. CONCLUSIONS
1. INTRODUCTION
2. PRELIMINARIES
2.1 System and Task Model
2.2 Motivating Example
2.3 General Scheduling Concept
2.4 Expected Energy and Time Demand
3. ENERGY MINIMIZATION PROBLEM
4. SOLUTION
4.1 Relaxed Energy Minimization Problem
4.2 General Energy Minimization Problem
4.3 Implementation
5. EXPERIMENTS
5.1 Experimental Setup
5.2 Results
6. CONCLUSIONS
Friday, August 6, 2010
Paper 4.1 LATA: A Latency and Throughput-Aware Packet Processing System
ABSTRACT
1. INTRODUCTION
2. LATA SYSTEM DESIGN
2.1 Program Representation
2.2 Communication Measurement
2.3 Problem Statement
2.4 DAG Generation
3. LATA SCHEDULING, REFINEMENT AND MAPPING
3.1 List-based Pipeline Scheduling Algorithm
3.2 Search-based Refinement Process
3.2.1 Latency Reduction
3.2.2 Throughput Improvement
3.3 Cache-Aware Resource Mapping
3.3.1 Pre-mapping
3.3.2 Real Mapping
4. EXPERIMENT FRAMEWORK
5. PERFORMANCE EVALUATION
5.1 Comparison with Parallel System
5.2 Comparison with Three NP Systems
5.3 Latency Constraint Effect
5.4 Scalability Performance of LATA
5.5 Instruction Cache Size Performance
1. INTRODUCTION
2. LATA SYSTEM DESIGN
2.1 Program Representation
2.2 Communication Measurement
2.3 Problem Statement
2.4 DAG Generation
3. LATA SCHEDULING, REFINEMENT AND MAPPING
3.1 List-based Pipeline Scheduling Algorithm
3.2 Search-based Refinement Process
3.2.1 Latency Reduction
3.2.2 Throughput Improvement
3.3 Cache-Aware Resource Mapping
3.3.1 Pre-mapping
3.3.2 Real Mapping
4. EXPERIMENT FRAMEWORK
5. PERFORMANCE EVALUATION
5.1 Comparison with Parallel System
5.2 Comparison with Three NP Systems
5.3 Latency Constraint Effect
5.4 Scalability Performance of LATA
5.5 Instruction Cache Size Performance
Subscribe to:
Comments (Atom)