J User-Level Checkpointing Definition of Checkpointing A checkpoint is a copy of the computers memory that is periodically saved on disk along with the current register settings (last instruction executed, etc. )which would allow the program to be restarted from this point User-level checkpointing is contained within the program itself. Like OS checkpointing, user-level checkpointing saves a program's state for a later restart. Below are some reasons you should incorporate checkpointing at the user level in your code Even with massively parallel systems, runtime for large models is often measured in days As the number of processors increases, there is a higher probability that one of the nodes your job is running on will suffer a hardware failure Not all operating systems support oS level checkpointing Larger parallel models require more lo to save the state of a program
User-Level Checkpointing • Definition of Checkpointing – A checkpoint is a copy of the computers memory that is periodically saved on disk along with the current register settings (last instruction executed, etc.) which would allow the program to be restarted from this point. • User-level checkpointing is contained within the program itself. Like OS checkpointing, user-level checkpointing saves a program's state for a later restart. Below are some reasons you should incorporate checkpointing at the user level in your code. – Even with massively parallel systems, runtime for large models is often measured in days. – As the number of processors increases, there is a higher probability that one of the nodes your job is running on will suffer a hardware failure. – Not all operating systems support OS level checkpointing. – Larger parallel models require more I/O to save the state of a program
Out-Of-Core Solvers In an out-of-core problem the entire amount of data does not fit in the local main memory of a processor and must be stored on disk Needed only if the os does not support virtual memory
Out-Of-Core Solvers • In an out-of-core problem the entire amount of data does not fit in the local main memory of a processor and must be stored on disk. – Needed only if the OS does not support virtual memory
) Data Mining Applications that have minimal computations, but many reterences to online databases are excellent candidates for parallel v /o a phone book directory could be broken down to have chunks of the database distributed on each node
Data Mining • Applications that have minimal computations, but many references to online databases are excellent candidates for parallel I/O. – A phone book directory could be broken down to have chunks of the database distributed on each node
Characteristics of serial
Characteristics of Serial I/O
Characteristics of Serial VO To help understand what parallel 0 entails, it is beneficial to review some of the defining features of serial lO and then compare the differences ° Physical Structure Generally there is one processor connected to one physical disk Logical Structure Traditional view of o from high level languages Single file pointer Access to files can be sequential or random File parameters(read/write only, etc. Data can be written raw or formatted, binary or ascii Built-in function calls C (fprintf, fscanf) Definition of file pointer An offset maintained by the system that points to the next position in a file to read from or write to
Characteristics of Serial I/O • To help understand what parallel I/O entails, it is beneficial to review some of the defining features of serial I/O and then compare the differences. • Physical Structure – Generally there is one processor connected to one physical disk • Logical Structure – Traditional view of I/O from high level languages – Single file pointer – Access to files can be sequential or random – File parameters (read/write only, etc.) – Data can be written raw or formatted, binary or ascii – Built-in function calls • C (fprintf, fscanf) • Definition of File Pointer – An offset maintained by the system that points to the next position in a file to read from or write to